You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anoop Sam John (JIRA)" <ji...@apache.org> on 2012/10/04 06:57:07 UTC

[jira] [Created] (HBASE-6942) Endpoint implementation for bulk delete

Anoop Sam John created HBASE-6942:
-------------------------------------

             Summary: Endpoint implementation for bulk delete
                 Key: HBASE-6942
                 URL: https://issues.apache.org/jira/browse/HBASE-6942
             Project: HBase
          Issue Type: Improvement
          Components: Coprocessors, Performance
            Reporter: Anoop Sam John
            Assignee: Anoop Sam John


We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477064#comment-13477064 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

{code}
+    long noOfRowsDeleted = invokeBulkDeleteProtocol(tableName, new Scan(), 500, DeleteType.ROW,
+        null);
{code}
I think the test should also cover the case where batchSize is smaller than the number of rows to be deleted.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482279#comment-13482279 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

That is fine. 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481471#comment-13481471 ] 

Hadoop QA commented on HBASE-6942:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12550303/HBASE-6942_Trunk.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new or modified tests.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 82 warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestSplitTransaction

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3110//console

This message is automatically generated.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475536#comment-13475536 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

I think Ted is right, looking at FirstKeyOnlyFilter, it can be combined with other filters into FilterList with Operator.MUST_PASS_ALL.

If we always combine with the FirstKeyOnlyFilter we can keep using a List for the deletedRowKeys, because there's only one ever.

Sorry for the length of the discussion here, Anoop. Almost there :)

+1 for the rest of the patch. (Have you tried it in production?)

And +1 for dealing with other delete types in a separate jira... Would be nice to use the same method, though, and just pass in an extra flag to indicate with type of delete to use (and hence how to filter the results).

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475911#comment-13475911 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Yeah, passing the delete option as scan attribute is a bit weird. But I can see this both ways.
It would be nice if all of this could be strictly controlled by the scan we pass in. The scan would (through the attribute) indicate the delete type to use and also describe the KVs that are to be deleted.

Also not sure about the template Delete... We'd have to make up fake column qualifiers and qualifiers in the future... Since this is an advanced feature we could pass the delete type (from KeyValue), or maybe a new enum to indicate what we want to do.

So I can see this going both ways. In either case this should probably return the number of KVs deleted.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_Trunk.patch

Patch for trunk
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482228#comment-13482228 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Will address this Findbugs warning also in next patch
Should org.apache.hadoop.hbase.coprocessor.example.BulkDeleteEndpoint$Column be a _static_ inner class?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476814#comment-13476814 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

bq. We do not have any way to get only the 1st KVs from all the families of row right? We have FirstKeyOnlyFilter now.
Right. I can't see any way to only get the 1st KV for a CF. I think we can live with that for now.

bq. We need to pass the rowBatchSize too
Yes, forgot about that.

bq. Do we need a Request object taking up the attributes? Yes some thing inline of protobufs?
Not sure. I like just passing the four parameters needed. (If that is not possible with protobufs, we should use a request object.)

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475721#comment-13475721 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Anoop, you were right from the beginning, we either use the filter passed with the scan (it is the responsibility of the caller to set it up correctly) or use FKOF (by itself).

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475544#comment-13475544 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

I think my doubts regarding FirstKeyOnlyFilter come correct. [Checked with change in code]
Suppose in table we have one CF and in that having 3 qualifiers
cf1:c1  cf1:c2  cf1:c3
Now I have to delete all the rows where cf1:c3=?
I will have a SCVF with this condition
Now suppose If I add FKOF also along with this in a FL
For any row the 1st KV which is coming to the filter will be of cf:c1's
The SCVF can not filter it out. Also the FKOF will pass it. 
But for the 2nd KV the FKOF will make the scan to go to the next row.
So the actual filter which was having my condition will never get a chance to do the validation and do filtering...

So I think generally the FKOF can not be used with this kind of condition filters..

Pls correct me if any thing I missed or wrongly said..

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476728#comment-13476728 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

I also think we have different use cases in mind. Passing the timestamp for a version delete through the Scan is not useful (I think) rather the timestamps should be taken from the scanned KeyValues.

I think you misunderstood what I was saying with "it should be controlled by the scan". What I meant was: Via the scan we will select a bunch of KeyValues. Then these KeyValues indicate what would be deleted (together with the various delete types).

I.e. a user can pass the VERSION delete type along with a scan selecting a bunch of KeyValue. Then these exact KeyValues will be deleted.
Likewise when COLUMN is passed the passed scan will identify a bunch of KeyValues to identify the columns to be delete (caller should make sure - only for performance - that the same column KeyValue does not occur multiple times)
Same for Families and Rows.

So we only pass a scan (with NO extra attributes in the scan) along with an indicator if the delete type. That should be enough.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475846#comment-13475846 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

I like the suggestion of passing Delete object to the endpoint.
If the Delete object has empty byte[] as row key, we make use of the Scan object as you have done. Otherwise, step of scanning the region can be skipped.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480631#comment-13480631 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

@Ted: Oops, yes, you're right. The break is at the correct spot.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478068#comment-13478068 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

{code}
  public int put(final byte[] regionName, final List<Put> puts)
{code}
The above method is only in 0.94 code base. It has no javadoc. But I interpret the meaning of its return value to be the first index of unsuccessful Put.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477177#comment-13477177 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

bq. Just worried that N will necessarily be (much?) larger than M.
How do we correlate scan result with which column families to delete ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk deletion of data

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-6942:
--------------------------

    Release Note: 
This issue gives an example Endpoint implementation for efficiently deleting bulk data from tables. Which  data to be deleted can be controlled using a Scan object passed to the endpoint.
We can delete rows, column families, column qualifiers or cell versions based on delete type passed.
Optionally timestamp also can be passed. When timestamp is passed for ROW, FAMILY and COLUMN delete types, all the versions before that time(specified time inclusive) will get deleted.
When the type is VERSION, if a timestamp is passed, only one version(with ts as given value) of all the cells which the Scan selected will be deleted. When no timestamp value passed for VERSION type delete it will delete all the cell versions which the Scan selected. Using appropriate Scan with Timerange etc user can control which versions to be deleted.
The API returns the number of rows deleted (In types other than ROW it is not entire row deleted) and when type is VERSION it will return total number of versions deleted also.
The Scan can be created with a rowkey range, with some filters, with Timerange etc based on the delete usecase.



  was:
This issue gives an Endpoint implementation for efficiently deleting bulk data from tables.Which all data to be deleted can be controlled using a Scan passed to the endpoint.
We can delete rows, column families, column qualifiers or cell versions based on delete type passed.
Optionally timestamp also can be passed. When timestamp is passed for delete types ROW, FAMILY and COLUMN all the versions before that time(specified time inclusive) will get deleted.
When the type is VERSION, if a timestamp is passed, only one version(with ts as given value) of all the cells which the Scan selected will be getting deleted. When no timestamp value passed for VERSION type delete it will delete all the cell versions which the Scan selected. Using appropriate Scan with Timerange etc user can control which all versions to be deleted.
The API returns the number of rows deleted (In types other than ROW it is not entire row deleted) and when type is VERSION it will return total number of versions deleted also.
The Scan can be created with a rowkey range, with some filters, with Timerange etc based on the delete usecase.



    
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477571#comment-13477571 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.You can send the enum's ordinal number across.
Yes Lars. Then we can not accept Enum types as a parameter in the CP Endpoint.. So user also need to pass the ordinal(as we don't have a client side wrapper API to call this Endpoint as of now)
I shall do it like that now..

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Status: Open  (was: Patch Available)
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942.patch
    
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475955#comment-13475955 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

In order to support all types of deletes we need to have the params
cf names which needs to be deleted
qualifier names which needs to be deleted
timestamp im case we need a time based delete
a type - This is needed when it is a delete version request

The scan ideally need not scan all the column which needs to be deleted. If I have a condition based delete, what needs to be scanned is the columns involving in the condition..

So taking all these in another POJO, i thought it would be better to accept as Delete object. In that we have APIs like deleteColumn(s), deleteFamily etc... Users also might be very much knowing this. What do u say Lars? The bit weird part I am seeing in this is the rowkey which need to be a fake one..
bq.We'd have to make up fake column qualifiers and qualifiers in the future
Sorry didnt get your mean

Thanks for the reviews and comments 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482437#comment-13482437 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

Integrated to 0.94 and trunk.

Thanks for the patch, Anoop.

Thanks for the review, Lars and Andy.

@Anoop:
Can you write up release note for this feature ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478028#comment-13478028 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Should we continue to check status code for the remaining opStatus ?
I was following the code in HRS.. Also Lars one suggested the same way..  Getting a SANITY_FAILURE wont happen here Also I am not able to get how we can get a FAILURE status also..
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477579#comment-13477579 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Lars
Will be better to pass the type constants (Int constants)
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480630#comment-13480630 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

bq. This check should be at the end of the loop.
The break statement is within this for loop:
{code}
+        for (int i = 0; i < rowBatchSize; i++) {
{code}
bq. Are we making this too complicated ?
The complication is due to the desire to satisfy more use cases. I think it is worth it.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477573#comment-13477573 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

If we pass Scan and Delete to endpoint, we can handle arbitrary deletion requests.

@Anoop:
I clicked on the link above - it is not obvious which comment you were referring to.
Please refer to comment by its time.

Thanks
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Description: 
We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

Query like  delete from table1 where...

  was:
We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

Query like  delete from table1 where...

    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk deletion of data

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482942#comment-13482942 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Thanks to Jerry and Jean for their inputs and review as well..
                
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477583#comment-13477583 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

I will make a patch based on the delete template also... 
So will be easy to compare..
I will make those today
Sorry was busy with meeting yday..
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476807#comment-13476807 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Just worried that N will necessarily be (much?) larger than M.
We do not have any way to get only the 1st KVs from all the families of row right? We have FirstKeyOnlyFilter now.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476134#comment-13476134 ] 

ramkrishna.s.vasudevan edited comment on HBASE-6942 at 10/15/12 1:28 PM:
-------------------------------------------------------------------------

One thing we should add in the javadoc is how many kvs the scan returns.  This should not be very huge and it should not impact the RS heap.
Users should be careful and i remember there was mail thread talking about limiting the memory that is used by Endpoints.

Missed a 'not'
                
      was (Author: ram_krish):
    One thing we should add in the javadoc is how many kvs the scan returns.  This should not be very huge and it should impact the RS heap.
Users should be careful and i remember there was mail thread talking about limiting the memory that is used by Endpoints.
                  
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480650#comment-13480650 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

See also Anoop's comment @ 20/Oct/12 04:05 w.r.t. batchMutate:
bq. As per the logic in HRS it will return 49 as the processed Puts count.. Does this correct?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk deletion of data

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490372#comment-13490372 ] 

Hudson commented on HBASE-6942:
-------------------------------

Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/])
    HBASE-6942 Endpoint implementation for bulk delete rows (Anoop) (Revision 1401332)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteProtocol.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteResponse.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

                
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475839#comment-13475839 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Regarding passing the rowBatchSize in attributes of Scan, I am in half mind Ted..  As this attribute is related with the delete op and not Scan...  Requesting your opinion also Lars..
I was thinking of enhancing this for doing all kind of deletes. We have CF delete, column delete, version delete and time based KVs delete etc... Ideally better we can support all these kind of deletes.  What is in my mind is to accept a Delete object as a template in this Endpoint...  Well for Delete we need some byte[] as rowkey.. Any dummy(empty byte[]) is okey..  What we need is to follow that Delete object create Delete objects in Endpoint. Thougts??

Yes Ted in that case we can pass back the number of KVs deleted..  
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478060#comment-13478060 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

This one Ted
HRegionServer#put(final byte[] regionName, final List<Put> puts)
{code}
OperationStatus codes[] = region.batchMutate(putsWithLocks);
      for (i = 0; i < codes.length; i++) {
        if (codes[i].getOperationStatusCode() != OperationStatusCode.SUCCESS) {
          return i;
        }
      }
{code}
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477581#comment-13477581 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Yes. I *really* do not want not to make more complicated than it is.
If somebody wants to delete a couple of column families and a couple of columns, it can be done with multiple roundtrips.

Now, if the code can be simplified by passing a Delete object, then we should do that.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475553#comment-13475553 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

Thanks for your example, Anoop.
FirstKeyOnlyFilter cannot be used directly.

In trunk, there is new Filter class that allows chaining and preserving the correct qualifier.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477987#comment-13477987 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

{code}
+            if (opStatus[i].getOperationStatusCode() != OperationStatusCode.SUCCESS) {
+              break;
+            }
{code}
Should we continue to check status code for the remaining opStatus ?
{code}
+            byte[] versionsDeleted = deleteWithLockArr[i].getFirst().getAttribute(
+                NO_OF_VERSIOS_TO_DELETE);
{code}
Typo in the constant name above.
{code}
+        int noOfVersiosToDelete = 0;
{code}
Typo in variable name above.
{code}
+      return Bytes.hashCode(this.family) + Bytes.hashCode(this.qualifier);
{code}
Can we come up with better hash code ?

Looking at both approaches, I think using delete template gives us flexibility and cleaner code.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477128#comment-13477128 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

{code}
BulkDeleteResponse delete(Scan scan, DeleteType type, Long timestamp, int batchSize)
{code}
What if user wants to delete more than one column family ? How would he / she formulate through one request ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475748#comment-13475748 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

When FirstKeyOnlyFilter is not involved, this endpoint has the potential to delete selected KeyValue's.
{code}
+public class BulkDeleteResponse implements Serializable {
+  private static final long serialVersionUID = -8192337710525997237L;
+  private Long rowsDeleted;
+  private IOException ioException;
{code}
Do you think the above response should have a field which notes the number of KeyValue's deleted ?

When this feature is made available to public, people would start using it.
It would be nice for API to accommodate future enhancement without requiring users to change the clients.

Nice job, Anoop.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473032#comment-13473032 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Thanks Lars and Ted for having a look at the patch
@Ted
You mean when the deletion in one region fails after some batches, that count wont get returned and so wont get added up to the total count. Correct?
Yes I realized that. 
As per Lars suggestion may be wont be giving a client class also. Just the endpoint implementation. Still Endpoint need to return how many rows it is deleted. Also IOE if any while deletion. Will change.
                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475542#comment-13475542 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Please add annotation for audience and stability.
This is to be there with Trunk only right? In 0.94 I think we should not add as per the Hadoop support
bq.Should a Set be used ? The use case is for deleting entire rows. We just need to collect unique row keys.
May be List is fine. Will be an issue when batching is there. But not expecting.. Any way Set seems perfect. I will change. 
FirstKeyOnlyFilter - Let me test one thing.. Having one doubt...
bq.Sorry for the length of the discussion here, Anoop. Almost there
No problems at all Lars.. :)

bq.Would be nice to use the same method, though, and just pass in an extra flag to indicate with type of delete to use
Yes this is what in mind too...
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Fix Version/s: 0.96.0
                   0.94.3
    
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk deletion of data

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

     Description: 
We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

Query like  delete from table1 where...

  was:
We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

Query like  delete from table1 where...

    Release Note: 
This issue gives an Endpoint implementation for efficiently deleting bulk data from tables.Which all data to be deleted can be controlled using a Scan passed to the endpoint.
We can delete rows, column families, column qualifiers or cell versions based on delete type passed.
Optionally timestamp also can be passed. When timestamp is passed for delete types ROW, FAMILY and COLUMN all the versions before that time(specified time inclusive) will get deleted.
When the type is VERSION, if a timestamp is passed, only one version(with ts as given value) of all the cells which the Scan selected will be getting deleted. When no timestamp value passed for VERSION type delete it will delete all the cell versions which the Scan selected. Using appropriate Scan with Timerange etc user can control which all versions to be deleted.
The API returns the number of rows deleted (In types other than ROW it is not entire row deleted) and when type is VERSION it will return total number of versions deleted also.
The Scan can be created with a rowkey range, with some filters, with Timerange etc based on the delete usecase.


         Summary: Endpoint implementation for bulk deletion of data  (was: Endpoint implementation for bulk delete rows)
    
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475515#comment-13475515 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Yes Lars I am adding the logic to count the rows deleted as this way.. But I was not able to test this.. Because getting a Status code of FAILURE or SANITY_FAILURE seems not possible in this case...  Any way better the code be in correct shape... This is the apt way to know...

Regarding Jerry's comment - His use case seems different than ours.. Ours is a range based row deletes.. Not even condition based. But I expect some one can have condition based delete also.. Any way we give Scan so it is possible.. I am specifically mentioning that this is used for deleting the entire row... 

We can have separate jira issues to deal with CF deletes, column deletes, version deletes etc... agree Lars? I will be happy to work with that.. Also will give a trunk version once 94 based version is fine for commit.

This delete performance is being discussed in different user mail threads I guess.. SO this kind of a example Endpoint impl will help folks...
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474771#comment-13474771 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Oops!  I missed OperationStatus completely..  Thanks Lars for pointing out... Yes I wrote test case with a CP which bypass specific row deletes.. With OperationStatus I can get the count correctly. Will post a patch soon
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_V5.patch

Not the final one. Need to change some javadoc and sample code I guess. Also as of now not added the no# of KVs deleted in the response..  Pls share your thoughts of the over all idea.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Summary: Endpoint implementation for bulk delete rows  (was: Endpoint implementation for bulk delete)
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482834#comment-13482834 ] 

Hudson commented on HBASE-6942:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #232 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/232/])
    HBASE-6942 Endpoint implementation for bulk delete rows (Anoop) (Revision 1401330)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/example/generated/BulkDeleteProtos.java
* /hbase/trunk/hbase-server/src/main/protobuf/BulkDelete.proto
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473038#comment-13473038 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Also, I wonder whether there is a way to allow this delete specific families, columns, or versions. Especially the latter might be useful... Maybe for another jira.
(Should this be in the examples package?)

                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475963#comment-13475963 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

For the V4 patch. You do need to check all the operationStatus', you can break out of the loop as soon as you find the first status that is not success.

Can do that on commit. +1 on the rest of the patch.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480756#comment-13480756 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.For passing the delete type bytes. What I meant was to use a enum backed by bytes
Lars we can not make the signature of endpoint like delete(Scan, DeleteType..)
I also tried initially like enum backed with byte.. But any way when customer calls the endpoint as delete(Scan, deleteType.getBytes()... )
I thought that will be only confusion..:(
We are not supporting enum type as a parameter to Endpoint


                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477558#comment-13477558 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

One another question
Do some one tried passing an enum type via the CP Endpoints?
I think it wont work.. I was checking why and found it is as per the code in HbaseObjectWritable.

In the kernel code also only one enum is passed across wire I think ie.RegionOpeningState
This one is specifically added to CODE_TO_CLASS and CLASS_TO_CODE Maps in HbaseObjectWritable

Is it a bug we need to address? Or some where we are telling that enums can not be used?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482106#comment-13482106 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Should the above be enclosed in if (deleteType == DeleteType.VERSION) block
Fine can add the if check.. Would avoid a getAttribute() call..
bq.* Copyright 2011 The Apache Software Foundation
Oops! sorry..
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476804#comment-13476804 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

API signature wise I am okey Lars. Passing in Scan attributes, I was in 2 minds all the time..:)
BulkDeleteResponse delete(Scan scan, DeleteType type, Long timestamp)
We need to pass the rowBatchSize too.. This is needed to accumulate the rows for a batched delete.
Do we need a Request object taking up the attributes? Yes some thing inline of protobufs?  Just asked

Yes above suggestions sounds good to me
bq.Documenting this will be tricky. I can have a shot at that (if you like, Anoop. If you prefer to do that, that's fine too).
Yes Lars you can do that if you like  :)

bq.Eventually, since we made it so general now, I can see this as an official API in HTable... But let's do that in an another jira (if others agree).
+1

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475985#comment-13475985 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Yes Lars, for the simplicity better to control every thing via Scan.. For the user also specify this would be easy.  Let me try it out with code.. Also will try passing every thing via the Scan#setAttribute() 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477554#comment-13477554 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

When user say he want to delete family cf1 and cf2 (with passing TS or not)
Then user need to create the Scan object appropriately. Include the cf1 and cf2 in the Scan
Now from the KVs we can create the Delete object
{code}
case FAMILY:
        Set<byte[]> families = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
        for (KeyValue kv : deleteRow) {
          if (families.add(kv.getFamily())) {
            delete.deleteFamily(kv.getFamily(), ts);
          }
        }
        break;
{code}
Add family of all the KVs into Delete..Used to set to avoid duplicate calls.
Am I making you clear Ted?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480668#comment-13480668 ] 

ramkrishna.s.vasudevan commented on HBASE-6942:
-----------------------------------------------

@Anoop
Nice work and covers all the cases.
@Ted/Lars
Also i feel that this is an endpoint impl.  It gives user an idea what all they can do.  
The rowBatchSize can be optional too right? May be user can see how he wants the rowbatchsize to be.  

 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475961#comment-13475961 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Hmm... Most of these parameters can be controlled with a scan.
For example to delete only some CFs, just configure the scan that way. I don't think we should make it more complicated/flexible than this.

What you are describing is another use case. What if I only want to delete the column that I am describing with the scan? (Now I would have include the superset of all possible columns in the passed Delete object)

If folks need more complicated logic they should write their own endpoint (now they have an example)... That is whole point of coprocessors, so that we do not have to anticipate every possible usecase :)

The beauty of this approach is that we can just pass a Scan object (along with just a delete type maybe) and have the endpoint do its work.

Anyway, I do not feel strongly about this. If you think that we need more flexibility and passing a Delete is the best way to pursue this, then let's do that... As long as the simple case is still simple.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476720#comment-13476720 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

@Anoop: Maybe I am not understanding how you would pass the Delete objects. Would you pass multiple ones? If not, how could you use it as a template for different types of deletes?

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478021#comment-13478021 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

@Jean and @Ted
My mistake with typo.. Will correct..

With deleteTemplate one thing which is not possible directly is deleting N versions usecase.. Seems Lars is having such usecase.. I am not sure
Also in this user need to be careful making both Scan and Delete object.. Dont feel some duplicate work?   
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475517#comment-13475517 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

V3 for your review
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471408#comment-13471408 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

This is a surprisingly (to me, anyway) common use case.
+1 to that idea.

We can ship a sample EndPoint for this with HBase.

                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471668#comment-13471668 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

As of now just added one client class which can be used to invoke this endpoint (just like AggregateClient)
Do we need to expose a new delete() API in HTable itself or just leave the Endpoint as such so that someone need to write code make call to the Endpoint.
                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480610#comment-13480610 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

@Lars Your comments on V6....

Also regarding Ted's doubt in finding the number of processed Mutation..  As above u can see the java doc in HRegionInterface#put() clearly tells that it return the number of processed Puts. Suppose I have 100 Puts in the input list in which the 50th one failed (assume Sanity check failed)..  Still the batchMutate will try to process all other 99 Puts..  As per the logic in HRS it will return 49 as the processed Puts count..  Does this correct?  Pls correct me if my understanding is wrong
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476727#comment-13476727 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Patch V5 I am trying to not rely on the Delete object or any. As Lars suggested rely on the scan o/p and we pass DeleteType
I also feel like this approach may be okey...
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475763#comment-13475763 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

Scan extends OperationWithAttributes which has the following method:
{code}
  public void setAttribute(String name, byte[] value) {
{code}
This allows users to pass various flags indicating what type of deletion they want to perform server-side.
Actually parameter rowBatchSize can be passed this way to BulkDeleteEndpoint.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477156#comment-13477156 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

bq. as part of the scan result
List<KeyValue> is returned from scanner.next(). It would be easier to understand user intention through Scan.getFamilyMap() instead of analyzing scan result.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480635#comment-13480635 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

@Lars:
What's your opinion for the HBASE-6942_DeleteTemplate.patch ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476134#comment-13476134 ] 

ramkrishna.s.vasudevan commented on HBASE-6942:
-----------------------------------------------

One thing we should add in the javadoc is how many kvs the scan returns.  This should not be very huge and it should impact the RS heap.
Users should be careful and i remember there was mail thread talking about limiting the memory that is used by Endpoints.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477568#comment-13477568 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Do we support delete family cf1 and delete column qualifier cq2 of family cf2 in one request ?
No Ted.. In this we can not do that...
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477566#comment-13477566 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

You can send the enum's ordinal number across.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480769#comment-13480769 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

This is an improvement which should go to trunk first, right ?
@Anoop:
please provide patch for trunk.

Thanks
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476361#comment-13476361 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

In patch v5, rowBatchSize isn't used in the following call:
{code}
+          hasMore = scanner.next(results);
{code}
In createDeleteMutation(), List<KeyValue> is passed and row key is retrieved from the first KeyValue.
{code}
+    byte[] row = deleteRow.get(0).getRow();
{code}
Is this intended ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475963#comment-13475963 ] 

Lars Hofhansl edited comment on HBASE-6942 at 10/15/12 4:54 AM:
----------------------------------------------------------------

For the V4 patch. You do not need to check all the operationStatus', you can break out of the loop as soon as you find the first status that is not success.

Can do that on commit. +1 on the rest of the patch.

Edit: Forgot a "not" in the 2nd sentence.
                
      was (Author: lhofhansl):
    For the V4 patch. You do need to check all the operationStatus', you can break out of the loop as soon as you find the first status that is not success.

Can do that on commit. +1 on the rest of the patch.
                  
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475632#comment-13475632 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

This filter returns the 1st KV corresponds to any of the columns(cf:qualifier)only
I think even that also we wont be able to use in our case...
For a delete request like "delete from tab where c2=? AND c3=?" Here once the filter finds KV corresponds to c2 it will make the scan jump to next row...  So wont get a chance to compare the value for c3

I will post a patch for trunk on Monday.. 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469151#comment-13469151 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

I have an implementation based on 0.94 version as we are using this version. I will upload that version for getting reviews. Later will create a patch for 0.96(or may be later versions). There some changes based on PB is needed I guess

                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475513#comment-13475513 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

@Anoop: True, but I think that is good enough. If folks add coprocessors they can mess up all kinds of things.

@Jerry: So what you are saying is that this does not help you?

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477143#comment-13477143 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.What if user wants to delete more than one column family ? How would he / she formulate through one request ?
When the type is FAMILY, we will delete all the families coming as part of the scan result.. So add N families in Scan

Yes I will add that test case also..

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480643#comment-13480643 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

If you look at createDeleteMutation, you see that only the row key is taken from the scanned KVs, the rest is taken from the DeleteTemplate. That does not appear very useful to me.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477174#comment-13477174 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

I don't necessarily agree here, Ted. Analyzing the scan result is the whole point of this jira. Passing a template family map will not make this easier for a user (IMHO).
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480766#comment-13480766 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Let's commit something very close to V6 soon.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472516#comment-13472516 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

@Anoop:
Can you add a test case where deletion fails in one of the regions of the table ?

Once you add that test, you would discover a small flaw in determining row count.

Thanks
                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Jean-Marc Spaggiari (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476127#comment-13476127 ] 

Jean-Marc Spaggiari commented on HBASE-6942:
--------------------------------------------

I will be very happy to give it a try when you will be ready. 

I need to delete all rows older than a specific given timestamp and this fix seem to be a perfect fit. Thanks!
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475846#comment-13475846 ] 

Ted Yu edited comment on HBASE-6942 at 10/14/12 3:15 PM:
---------------------------------------------------------

I like the suggestion of passing Delete object to the endpoint.
                
      was (Author: yuzhihong@gmail.com):
    I like the suggestion of passing Delete object to the endpoint.
If the Delete object has empty byte[] as row key, we make use of the Scan object as you have done. Otherwise, step of scanning the region can be skipped.
                  
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Jerry Lam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475382#comment-13475382 ] 

Jerry Lam commented on HBASE-6942:
----------------------------------

In my use case, I use extensively the version delete marker because I need to delete a specific version of a cell (row key, CF, qualifier, timestamp). I have a mapreduce job that will run across some regions and based on some business rules, some of the cells will be deleted in the table using the version delete marker. The business rules for deletion are scoped to each column family at a time. Therefore, there are no logically dependency of deletions between column families.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480667#comment-13480667 ] 

ramkrishna.s.vasudevan commented on HBASE-6942:
-----------------------------------------------

{code}
      OperationStatus codes[] = region.batchMutate(putsWithLocks);
      for (i = 0; i < codes.length; i++) {
        if (codes[i].getOperationStatusCode() != OperationStatusCode.SUCCESS) {
          return i;
        }
      }
{code}
This will need a change any way.
But currently there is no code that is calling it. All clients will call only batchMutate.  The above problem lies in the put().  In trunk it has been removed. 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481491#comment-13481491 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

Please use remove the trailing CR (^M) in the patch.
Trunk patch looks good overall.
{code}
+public class BulkDeleteEndpoint extends BulkDeleteService implements CoprocessorService,^M
+    Coprocessor {^M
{code}
Please add javadoc to above class.
{code}
+          scanner.close();^M
+        } catch (IOException ioe) {^M
+          LOG.debug(ioe);^M
{code}
Should LOG.error() be used above ?
{code}
+            byte[] versionsDeleted = deleteWithLockArr[i].getFirst().getAttribute(^M
+                NO_OF_VERSIONS_TO_DELETE);^M
+            if (versionsDeleted != null) {^M
+              totalVersionsDeleted += Bytes.toInt(versionsDeleted);^M
+            }^M
{code}
Should the above be enclosed in if (deleteType == DeleteType.VERSION) block ?
{code}
+      h = h + 13 * Bytes.hashCode(this.family);^M
+      h = h + 13 * Bytes.hashCode(this.qualifier);^M
{code}
Would mulplication by 13 result in overflow ?
Take a look at the following code from HColumnDescriptor:
{code}
    result ^= Byte.valueOf(COLUMN_DESCRIPTOR_VERSION).hashCode();
    result ^= values.hashCode();
{code}
{code}
+ * Copyright 2011 The Apache Software Foundation^M
{code}
The above line is not needed.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474876#comment-13474876 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Small update on the above comment.. Bypass of deletes if done via CP I wont be able to know that from OperationStatus. In that we will still mark it as SUCCESS only.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474758#comment-13474758 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Thanks for the comments.. I will check
bq.Do you have bandwidth to make the endpoint conform to new PB approach ?
Yes I can do... I have done it partially...
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477578#comment-13477578 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Ted
Some drawbacks due to not taking Delete object 
1. When it is a timestamp based delete same TS to be used for all the columns where as in normal delete diff TS can be used
2. Types can not be mixed. In normal delete one CF delete and other one's column delete and yet others version delete can be combined
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478621#comment-13478621 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Yes 0.94 code base
Pls see the javadoc in HRegionInterface.java
{code}
/**
   * Put an array of puts into the specified region
   *
   * @param regionName region name
   * @param puts List of puts to execute
   * @return The number of processed put's.  Returns -1 if all Puts
   * processed successfully.
   * @throws IOException e
   */
  public int put(final byte[] regionName, final List<Put> puts)
  throws IOException;
{code}
*return The number of processed put's*

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472509#comment-13472509 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Patch looks good over all (but have not looked in detail).
Since this is a specific use case (we're issuing family deletes always), I'd say we should only provide the EndPoint and don't add a public API into HTableInterface.

                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_V3.patch
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476723#comment-13476723 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

In the Delete template way, pass one Delete object only. The rowkey will be dummy only.. What we grab from this is which all family/col/version delete needed. User can create the Delete object and call deleteColumn(s) etc on that object....
At Endpoint side we will make use of the Delete object and get the family map from that.. We will use that family map to create the family map for the new Delete objects that we have created..
Here is the sample code for that
{code}
Delete delete = new Delete(deleteRowKey);
            if (deleteTemplate != null) {
              // when a delete is passed, get all the properties of that delete other than the
              // rowkey and give to the new Delete objects being created
              // No need to check for row locks. No way to get these locks at the client
              // side as the rows itself is not known.
              delete.setTimestamp(deleteTemplate.getTimeStamp());
              delete.setWriteToWAL(deleteTemplate.getWriteToWAL());
              setFamilyMap(deleteRowKey, deleteTemplate, delete);
            }
private void setFamilyMap(byte[] deleteRowKey, Delete deleteTemplate, Delete delete) {
    Map<byte[], List<KeyValue>> deleteTemplateFamilyMap = deleteTemplate.getFamilyMap();
    Map<byte[], List<KeyValue>> deleteFamilyMap = new HashMap<byte[], List<KeyValue>>();
    for (Entry<byte[], List<KeyValue>> entry : deleteTemplateFamilyMap.entrySet()) {
      List<KeyValue> deleteTemplateKVs = entry.getValue();
      List<KeyValue> deleteKVs = new ArrayList<KeyValue>(deleteTemplateKVs.size());
      for (KeyValue kv : deleteTemplateKVs) {
        deleteKVs.add(new KeyValue(deleteRowKey, entry.getKey(), kv.getQualifier(), kv
            .getTimestamp(), Type.codeToType(kv.getType())));
      }
      deleteFamilyMap.put(entry.getKey(), deleteKVs);
    }
    delete.setFamilyMap(deleteFamilyMap);
  }
{code}
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475583#comment-13475583 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.In trunk, there is new Filter class that allows chaining and preserving the correct qualifier.
Which one you are refering to Ted? I know FirstKeyValueMatchingQualifiersFilter
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482534#comment-13482534 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

woohoo :)
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482492#comment-13482492 ] 

Hudson commented on HBASE-6942:
-------------------------------

Integrated in HBase-0.94 #548 (See [https://builds.apache.org/job/HBase-0.94/548/])
    HBASE-6942 Endpoint implementation for bulk delete rows (Anoop) (Revision 1401332)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteProtocol.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteResponse.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_V7.patch

Patch addressing comments
Ted I will submit patch for trunk tomorow
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480828#comment-13480828 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

No. It's an example endpoint; it absolutely does not have to go into trunk first. (Of course it should also go into trunk)



                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476709#comment-13476709 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.In patch v5, rowBatchSize isn't used in the following call:
bq.hasMore = scanner.next(results);

Ted this code part is same in older versions also. We use the rowBatchSize to accumulate the rows so that all can be deleted as one unit. This is not scan batch.. We do rowBatchSize times scanner.next(results) just like HRS

bq.In createDeleteMutation(), List<KeyValue> is passed and row key is retrieved from the first KeyValue.Is this intended ?
Yes. All the KVs in this list will be part of same row.. So we can use any of the KV to get the rowkey. All rowkeys will be same..
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475962#comment-13475962 ] 

ramkrishna.s.vasudevan commented on HBASE-6942:
-----------------------------------------------

If delete also we are supporting, then its better to provide a sample POJO class.  Anyway its going to be listed in the examples section.  So a clear documentation and a proper example will atleast help users in using it.  Generally as Ted suggested users at first level will just try to copy paste the examples and see how it is behaving.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476126#comment-13476126 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Some drawbacks due to not taking Delete object 
1. When it is a timestamp based delete same TS to be used for all the columns where as in normal delete diff TS can be used
2. Types can not be mixed. In normal delete one CF delete and other one's column delete and yet others version delete can be combined

Just mentioning the differences. Dont think these are common usecases. So I am fine with the current way of ours. 

Requesting thoughts from other users who having similar usecases.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477916#comment-13477916 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Patch V6 with Scan based approach...
Regarding returning the no# of KVs deleted.
I think this is having meaning only when delete type is VERSION...  In other types we don't know exactly how many KVs deleted...
In case of VERSION delete we return the no# versions deleted
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_V2.patch

Changed as per comments from Lars and Ted
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Jean-Marc Spaggiari (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477978#comment-13477978 ] 

Jean-Marc Spaggiari commented on HBASE-6942:
--------------------------------------------

Should:
NO_OF_VERSIOS_TO_DELETE = "noOfVersiosToDelete"
be
NO_OF_VERSIONS_TO_DELETE = "noOfVersionsToDelete"
?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477919#comment-13477919 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

HBASE-6942_DeleteTemplate  patch with Delete template approach
Here one thing we can not directly implement is the deletion  of N versions.. With scan based approach as every thing governed by the scan result it is easy

With both these I feel being a user go with Scan based approach will be easy. The mixed kind of delete only will be not possible in that case

Requesting opinion from others
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475590#comment-13475590 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

That is the one. 
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476744#comment-13476744 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Lars
   The version deletes are some what like a grey area for me.. We do not have specific use case. I was trying to just allow the same way of version delete as from the normal client side..  Yes I understood what is in your mind clearly now.. Yes for version delete may be use the TS as got from the scanned KVs we can use...
Let me think more about the version delete part..  So lars for other types the approach in V5 is okey?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473109#comment-13473109 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Maybe you could a client side code snippet to the Javadoc of the endpoint?
Will add the snippet in the javadoc. 
bq.Also, I wonder whether there is a way to allow this delete specific families, columns, or versions. Especially the latter might be useful... Maybe for another jira.
We can have another jira issues for these
bq.Should this be in the examples package?
The doubt came in my mind too... What do u say Lars? I am fine with examples package.
                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476775#comment-13476775 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Sounds good. Just worried that N will necessarily be (much?) larger than M.

One last thing. Let's not pass things as Scan attributes, it just makes it more obtuse (IMHO :) ). We can have an API:
{{BulkDeleteResponse delete(Scan scan, DeleteType type, Long timestamp)}}
timestamp could be null (hence a Long). If it is null, we use the lastest timestamp for ROW/FAMILY/COLUMN and the KV's timestamp for VERSION, otherwise we can use the passed timestamp... That should account for all use cases.

(Sorry, I think it was my comment to use a Scan only to made you change it...? That was not what I meant... I just think it's better to be explicit about extra parameters for an operation).

Does that sound OK?

Documenting this will be tricky. I can have a shot at that (if you like, Anoop. If you prefer to do that, that's fine too).

Thanks for all the work on this and the patience, this is an awesome feature!

Eventually, since we made it so general now, I *can* see this as an official API in HTable... But let's do that in an another jira (if others agree).

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477570#comment-13477570 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

See my previous comment
https://issues.apache.org/jira/browse/HBASE-6942?focusedCommentId=13476126&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13476126
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480639#comment-13480639 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

bq. Passing the DeleteTemplate is very inflexible by comparison.
That patch uses the Scan parameter as well. Can you elaborate why using DeleteTemplate is very inflexible ?
See Annop's comment @ 17/Oct/12 15:40 about limitation for patch v6:
bq. The mixed kind of delete only will be not possible in that case
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_DeleteTemplate.patch
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482385#comment-13482385 ] 

Hadoop QA commented on HBASE-6942:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12550464/HBASE-6942_Trunk-V2.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new or modified tests.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 82 warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3125//console

This message is automatically generated.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_94-V8.patch
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473034#comment-13473034 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Maybe you could a client side code snippet to the Javadoc of the endpoint? If not, the test is documentation enough about how to use it, I think.

                
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch
>
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Description: 
We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

Query like  delete from table1 where...

  was:We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

    
> Endpoint implementation for bulk delete
> ---------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>
> We can provide an end point implementation for doing a bulk delete (based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476758#comment-13476758 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Lars, I think other than the version type the patch is in line with your thinking. With Version type the main difference is patch V5 considers deleting only one version of a column where as in your comment it is N version depending on the scan (how many KVs it returned).. Yes I agree 100% with you on this

wrt FAMILY delete type, I think we can handle. Let the scan returns N KVs for a row. If that N KVs corresponds to M families, we delete all M families. Also TS can be used here as well...
What do u say Lars?
Thanks

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_V4.patch
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474295#comment-13474295 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

@Anoop:
Can you point me to the change in patch v2 that would account for partially deleted rows in a region ?

Also, please provide patch for trunk.

Do you have bandwidth to make the endpoint conform to new PB approach ?

Thanks
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475637#comment-13475637 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

Looks like FilterList.filterKeyValue() should be enhanced (in another JIRA) where ReturnCode.NEXT_ROW would be returned for the second occurrence of the same row key whose first ReturnCode was ReturnCode.INCLUDE.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476733#comment-13476733 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

For COLUMN, FAMILY, and ROW deletes it does make sense to pass a timestamp - I guess - in that case it could delete only KVs older than the TS would be deleted.
For VERSION deletes it makes no sense though.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476750#comment-13476750 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Hey Anoop, we crossed comments. I think the rest of V5 is along the lines of my last comment here, right?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482502#comment-13482502 ] 

Hudson commented on HBASE-6942:
-------------------------------

Integrated in HBase-TRUNK #3474 (See [https://builds.apache.org/job/HBase-TRUNK/3474/])
    HBASE-6942 Endpoint implementation for bulk delete rows (Anoop) (Revision 1401330)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/example/generated/BulkDeleteProtos.java
* /hbase/trunk/hbase-server/src/main/protobuf/BulkDelete.proto
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480636#comment-13480636 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Personally I prefer V6. The power of this code is that the KVs/columns/families to be deleted can be described by a scan. Passing the DeleteTemplate is very inflexible by comparison.
Now, potentially we could increase flexibility here too. For example when we find the template KVs, if it is a COLUMN marker, we can take the row and column for the scanned row, if it a version, we take the row, column, and ts from the scanned row, etc.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478038#comment-13478038 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

bq. was following the code in HRS
Are you referring to the following code in doBatchOp() ?
{code}
          case SANITY_CHECK_FAILURE:
            result = ResponseConverter.buildActionResult(
                new FailedSanityCheckException(codes[i].getExceptionMsg()));
            builder.setResult(i, result);
            break;
{code}
The above is within switch statement nested inside a for loop.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477567#comment-13477567 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

Do we support delete family cf1 and delete column qualifier cq2 of family cf2 in one request ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475522#comment-13475522 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

{code}
+public class BulkDeleteEndpoint extends BaseEndpointCoprocessor implements BulkDeleteProtocol {
{code}
Please add annotation for audience and stability.
{code}
+        List<byte[]> deleteRowKeys = new ArrayList<byte[]>(rowBatchSize);
{code}
Should a Set be used ? The use case is for deleting entire rows. We just need to collect unique row keys.
{code}
+    if (scan.getFilter() == null) {
+      // What we need is just the rowkeys. So only 1st KV from any row is enough.
+      scan.setFilter(new FirstKeyOnlyFilter());
+    }
{code}
If user has specified Filter already but the Filter would return many KeyValues from one row, should we introduce FirstKeyOnlyFilter through FilterList ?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481633#comment-13481633 ] 

Andrew Purtell commented on HBASE-6942:
---------------------------------------

Sorry for being late to the party. Really glad to see your interest and effort in submitting a coprocessor example. We need more of these! I took a look at the trunk patch after reading through this issue. +1 after addressing Ted's comments, except for the hashcode multiplication thing.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476292#comment-13476292 ] 

Ted Yu commented on HBASE-6942:
-------------------------------

An alternative, assuming Delete is passed as a parameter to BulkDeleteEndpoint, would be to reduce DeleteType to two enum values: ROW and NON_ROW.
For NON_ROW case, the user is able to specify any combination of deletions based on non-null row key (just a placeholder):
{code}
    Delete del = new Delete(row);
    del.deleteFamily(FAMILY1);
{code}
In createDeleteMutation(), we call Delete.getFamilyMap() and plug in the row key obtained from the scanning for all KeyValue's contained in family map.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk deletion of data

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482941#comment-13482941 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

Thanks Ted, Lars, Andrew and Ram for the review. Happy to see this in..
                
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477574#comment-13477574 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Alternatively we use four integer constants, or do what you had suggested earlier and pass a Delete template object (although I still think that would be confusing).

Since these are endpoints, it is also possible to just have 4 different endpoint that share some methods between them.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475637#comment-13475637 ] 

Ted Yu edited comment on HBASE-6942 at 10/13/12 4:24 PM:
---------------------------------------------------------

Looks like FilterList.filterKeyValue() should be enhanced (with a new setting, in another JIRA) where ReturnCode.NEXT_ROW would be returned for the second occurrence of the same row key whose first ReturnCode was ReturnCode.INCLUDE.
                
      was (Author: yuzhihong@gmail.com):
    Looks like FilterList.filterKeyValue() should be enhanced (in another JIRA) where ReturnCode.NEXT_ROW would be returned for the second occurrence of the same row key whose first ReturnCode was ReturnCode.INCLUDE.
                  
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476747#comment-13476747 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

Maybe let's step back and list all the use cases. Here're the ones I have been thinking about:
* Delete a set of exact versions of some keyvalues: VERSION delete type and a scan that via setMaxVersions/setTimeStamp/setTimeRange/setFilter selects a set of KVs. Delete those KVs exactly.
* Delete a certain set of rows (that's how we started)... ROW delete type and a scan, we'll use FirstKeyOnlyFilter and delete all rows found.
* Delete a set of columns. COLUMN delete type with a scan that returns exactly one version of each KV. Take the column of that KV and delete it.
* Delete some column families. This one is a bit more tricky since we cannot create a scan that only return a single KV for each family. Here it would be necessary to pass either a Delete template or a set of families to delete... I'd say we can table this for later.

Now for the timestamp use cases:
* Delete all ROWS or COLUMNS older than some TS. Pass the according delete type, a TS, and a scan selecting the right rows or columns.

So except the family delete, we can cover all cases by passing a appropriately created scan object, a delete type, and a TS.

Does this make any sense? Am I missing important use cases?
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480620#comment-13480620 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

This:
{code}
+          if (!hasMore) {
+            // There are no more rows.
+            break;
+          }
{code}
Is at the wrong spot. Per RegionScanner contract a return of false indicates that the there is no *next* batch of values, but that does not mean that there are no KVs in the *current* batch. This check should be at the end of the loop.

Passing the number of versions deleted through an attribute is hokey. I wonder whether there is a better approach.

For passing the delete type bytes. What I meant was to use a enum backed by bytes. See KeyValue.Type and the getCode and codeToType methods.

I'm not sure I see a usecase for the VERSION marker together with a passed timestamp.
This will only delete the exact version of all the KVs scanned.
 
Otherwise looks good.

@Ted: The batchMutate call will not produce gaps. The first index not successful is equal to the number of successful operations. So this is correct in this patch.

Lastly... Are we making this too complicated? The first patch was nice and simple, it only deleted rows. Now we have a pretty complicated piece of code... Just asking, I still think it's a nice patch.

                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_V6.patch
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482104#comment-13482104 ] 

Anoop Sam John commented on HBASE-6942:
---------------------------------------

bq.Would mulplication by 13 result in overflow ?
Wont be an issue I guess Ted.. This might create -ve numbers but that is fine.. You can see TablePermission.java.. 
bq.Should the above be enclosed in if (deleteType == DeleteType.VERSION) block
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Anoop Sam John (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John updated HBASE-6942:
----------------------------------

    Attachment: HBASE-6942_Trunk-V2.patch
    
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6942) Endpoint implementation for bulk deletion of data

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-6942:
--------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)
    
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch, HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch, HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474321#comment-13474321 ] 

Lars Hofhansl commented on HBASE-6942:
--------------------------------------

HRegionServer.put(..., List<Put>) does this to determine the number of successful operations:
{code}
      OperationStatus codes[] = region.batchMutate(putsWithLocks);
      for (i = 0; i < codes.length; i++) {
        if (codes[i].getOperationStatusCode() != OperationStatusCode.SUCCESS) {
          return i;
        }
      }
{code}
Should add this here as well. Otherwise v2 looks good.
                
> Endpoint implementation for bulk delete rows
> --------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942.patch, HBASE-6942_V2.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of rows(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira