You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Ted Tuttle (JIRA)" <ji...@apache.org> on 2012/06/21 20:12:42 UTC

[jira] [Created] (HBASE-6254) deletes w/ many column qualifiers overwhelm Region Server

Ted Tuttle created HBASE-6254:
---------------------------------

             Summary: deletes w/ many column qualifiers overwhelm Region Server
                 Key: HBASE-6254
                 URL: https://issues.apache.org/jira/browse/HBASE-6254
             Project: HBase
          Issue Type: Bug
          Components: performance, regionserver
    Affects Versions: 0.94.0
         Environment: 5 node Cent OS + 1 master, v0.94 on cdh3u3
            Reporter: Ted Tuttle


Execution of Deletes constructed with thousands of calls to Delete.deleteColumn(family, qualifier) are very expensive and slow.

On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete (as measured by client).

When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized for about 1 hour.

This lead to the client timing out after 20min (2min x 10 retries).  In one case, the client was able to fill the RPC callqueue and received the following error:

  Failed all from region=<region>,hostname=<host>, port=<port> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?

Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue retrieved from scan based on domain objects.  This version of the delete ran in about 500ms.

User group thread titled "RS unresponsive after series of deletes" has related logs and stacktraces.  

Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6254) deletes w/ many column qualifiers overwhelm Region Server

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Ted Yu updated HBASE-6254:
----------------------------------

    Description: 
Execution of Deletes constructed with thousands of calls to Delete.deleteColumn(family, qualifier) are very expensive and slow.

On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete (as measured by client).

When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized for about 1 hour.

This lead to the client timing out after 20min (2min x 10 retries).  In one case, the client was able to fill the RPC callqueue and received the following error:
{code}
  Failed all from region=<region>,hostname=<host>, port=<port> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?
{code}
Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue retrieved from scan based on domain objects.  This version of the delete ran in about 500ms.

User group thread titled "RS unresponsive after series of deletes" has related logs and stacktraces.  

Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP

Here is the stack dump of region server: http://pastebin.com/8y5x4xU7

  was:
Execution of Deletes constructed with thousands of calls to Delete.deleteColumn(family, qualifier) are very expensive and slow.

On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete (as measured by client).

When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized for about 1 hour.

This lead to the client timing out after 20min (2min x 10 retries).  In one case, the client was able to fill the RPC callqueue and received the following error:

  Failed all from region=<region>,hostname=<host>, port=<port> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?

Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue retrieved from scan based on domain objects.  This version of the delete ran in about 500ms.

User group thread titled "RS unresponsive after series of deletes" has related logs and stacktraces.  

Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP

    
> deletes w/ many column qualifiers overwhelm Region Server
> ---------------------------------------------------------
>
>                 Key: HBASE-6254
>                 URL: https://issues.apache.org/jira/browse/HBASE-6254
>             Project: HBase
>          Issue Type: Bug
>          Components: performance, regionserver
>    Affects Versions: 0.94.0
>         Environment: 5 node Cent OS + 1 master, v0.94 on cdh3u3
>            Reporter: Ted Tuttle
>
> Execution of Deletes constructed with thousands of calls to Delete.deleteColumn(family, qualifier) are very expensive and slow.
> On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete (as measured by client).
> When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized for about 1 hour.
> This lead to the client timing out after 20min (2min x 10 retries).  In one case, the client was able to fill the RPC callqueue and received the following error:
> {code}
>   Failed all from region=<region>,hostname=<host>, port=<port> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?
> {code}
> Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue retrieved from scan based on domain objects.  This version of the delete ran in about 500ms.
> User group thread titled "RS unresponsive after series of deletes" has related logs and stacktraces.  
> Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP
> Here is the stack dump of region server: http://pastebin.com/8y5x4xU7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6254) deletes w/ many column qualifiers overwhelm Region Server

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398719#comment-13398719 ] 

Zhihong Ted Yu commented on HBASE-6254:
---------------------------------------

>From HRegion, prepareDeleteTimestamps() performs one get operation per column qualifier:
{code}
      for (KeyValue kv: kvs) {
        //  Check if time is LATEST, change to time of most recent addition if so
        //  This is expensive.
        if (kv.isLatestTimestamp() && kv.isDeleteType()) {
...
          List<KeyValue> result = get(get, false);
{code}
We perform get() for each kv whose time is LATEST.
This explains the unresponsiveness.

I think we can group some configurable number of qualifiers in each get and perform classification on result.
This way we can reduce the number of times HRegion$RegionScannerImpl.next() is called.
                
> deletes w/ many column qualifiers overwhelm Region Server
> ---------------------------------------------------------
>
>                 Key: HBASE-6254
>                 URL: https://issues.apache.org/jira/browse/HBASE-6254
>             Project: HBase
>          Issue Type: Bug
>          Components: performance, regionserver
>    Affects Versions: 0.94.0
>         Environment: 5 node Cent OS + 1 master, v0.94 on cdh3u3
>            Reporter: Ted Tuttle
>
> Execution of Deletes constructed with thousands of calls to Delete.deleteColumn(family, qualifier) are very expensive and slow.
> On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete (as measured by client).
> When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized for about 1 hour.
> This lead to the client timing out after 20min (2min x 10 retries).  In one case, the client was able to fill the RPC callqueue and received the following error:
> {code}
>   Failed all from region=<region>,hostname=<host>, port=<port> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?
> {code}
> Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue retrieved from scan based on domain objects.  This version of the delete ran in about 500ms.
> User group thread titled "RS unresponsive after series of deletes" has related logs and stacktraces.  
> Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP
> Here is the stack dump of region server: http://pastebin.com/8y5x4xU7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6254) deletes w/ many column qualifiers overwhelm Region Server

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398735#comment-13398735 ] 

Zhihong Ted Yu commented on HBASE-6254:
---------------------------------------

Since KeyValue implements HeapSize, we can keep adding column qualifiers until we reach configurable threshold.
After get(get, false) returns, we can parse out the column qualifiers from result.
                
> deletes w/ many column qualifiers overwhelm Region Server
> ---------------------------------------------------------
>
>                 Key: HBASE-6254
>                 URL: https://issues.apache.org/jira/browse/HBASE-6254
>             Project: HBase
>          Issue Type: Bug
>          Components: performance, regionserver
>    Affects Versions: 0.94.0
>         Environment: 5 node Cent OS + 1 master, v0.94 on cdh3u3
>            Reporter: Ted Tuttle
>
> Execution of Deletes constructed with thousands of calls to Delete.deleteColumn(family, qualifier) are very expensive and slow.
> On our (quiet) cluster, a Delete w/ 20k qualifiers took about 13s to complete (as measured by client).
> When 10 such Deletes were sent to the cluster via HTable.delete(List<Delete>), one of RegionServers ended up w/ 5 of the requests and became 100% CPU utilized for about 1 hour.
> This lead to the client timing out after 20min (2min x 10 retries).  In one case, the client was able to fill the RPC callqueue and received the following error:
> {code}
>   Failed all from region=<region>,hostname=<host>, port=<port> java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?
> {code}
> Based on feedback (http://search-hadoop.com/m/yITsc1WcDWP), I switched to Delete.deleteColumn(family, qual, timestamp) where timestamp came from KeyValue retrieved from scan based on domain objects.  This version of the delete ran in about 500ms.
> User group thread titled "RS unresponsive after series of deletes" has related logs and stacktraces.  
> Link to thread: http://search-hadoop.com/m/RmIyr1WcDWP
> Here is the stack dump of region server: http://pastebin.com/8y5x4xU7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira