You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Yang Yang (JIRA)" <ji...@apache.org> on 2011/02/12 07:11:57 UTC

[jira] Created: (CASSANDRA-2157) Hector concurrentHClient pool gives out more connections than its quota

Hector concurrentHClient pool gives out more connections than its quota
-----------------------------------------------------------------------

                 Key: CASSANDRA-2157
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2157
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
            Reporter: Yang Yang


Hector ConcurrentHClient.java can give up on connection pool grabbing, in line 85 (following all refer to latest 0.7.0 head)


     } else {

        try {
          cassandraClient = availableClientQueue.poll(maxWaitTimeWhenExhausted, TimeUnit.MILLISECONDS);
          if ( cassandraClient == null ) {
            numBlocked.decrementAndGet();
            throw new PoolExhaustedException(String.format("maxWaitTimeWhenExhausted exceeded for thread %s on host %s",
                new Object[]{
                Thread.currentThread().getName(),
                cassandraHost.getName()}
            ));
          }
        } catch (InterruptedException ie) {
          //monitor.incCounter(Counter.POOL_EXHAUSTED);
          numActive.decrementAndGet();
        }

so if we specify a maxwaittime, it could give up and **** do a numActive.decrementAndGet().


but in the HConnectionManager.java

  public void operateWithFailover(Operation<?> op) throws HectorException {

in the main loop of this method,  

        client =  getClientFromLBPolicy(excludeHosts);
could throw Exception.
  in the catch part,  there is a clause for 

        } else if ( he instanceof PoolExhaustedException ) {
          retryable = true;
          --retries;
          if ( hostPools.size() == 1 ) {
            throw he;
          }
          monitor.incCounter(Counter.POOL_EXHAUSTED);
          excludeHosts.add(client.cassandraHost);
        }

I guess this is written for the timeout scenario above, so it's supposed to catch that.
but getClientFromLBPolicy() reconstructs a general HectorException from the PoolExhaustedException given by borrowClient().
this makes all pool grabbing timeout immediately pop up to client, which I guess is not the original intention.

so I guess getClientFromLBPolicy() needs to throw directly the original Exception. so as to trigger the logic in the catch part.

but after I made those changes, I found that I often get ActiveNum() from the pool to be negative, and TillExhausted to be higher than the quota. this does not make sense.
this was because that every code path goes through the line "releaseClient()" in the  finally {} clause. so that on the pool grabbing , numActive.decrementAndGet() was already executed, and it also gets executed in the finally clause



this end up creating many connections to the server, which bogs down the server , we have seen it creating huge cpu load

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (CASSANDRA-2157) Hector concurrentHClient pool gives out more connections than its quota

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood resolved CASSANDRA-2157.
---------------------------------

    Resolution: Invalid

This is an awesome bug report, but the Cassandra project itself does not maintain Hector: you should probably re-file this bug with the developers on Github: https://github.com/rantav/hector

> Hector concurrentHClient pool gives out more connections than its quota
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-2157
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2157
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Yang Yang
>
> Hector ConcurrentHClient.java can give up on connection pool grabbing, in line 85 (following all refer to latest 0.7.0 head)
>      } else {
>         try {
>           cassandraClient = availableClientQueue.poll(maxWaitTimeWhenExhausted, TimeUnit.MILLISECONDS);
>           if ( cassandraClient == null ) {
>             numBlocked.decrementAndGet();
>             throw new PoolExhaustedException(String.format("maxWaitTimeWhenExhausted exceeded for thread %s on host %s",
>                 new Object[]{
>                 Thread.currentThread().getName(),
>                 cassandraHost.getName()}
>             ));
>           }
>         } catch (InterruptedException ie) {
>           //monitor.incCounter(Counter.POOL_EXHAUSTED);
>           numActive.decrementAndGet();
>         }
> so if we specify a maxwaittime, it could give up and **** do a numActive.decrementAndGet().
> but in the HConnectionManager.java
>   public void operateWithFailover(Operation<?> op) throws HectorException {
> in the main loop of this method,  
>         client =  getClientFromLBPolicy(excludeHosts);
> could throw Exception.
>   in the catch part,  there is a clause for 
>         } else if ( he instanceof PoolExhaustedException ) {
>           retryable = true;
>           --retries;
>           if ( hostPools.size() == 1 ) {
>             throw he;
>           }
>           monitor.incCounter(Counter.POOL_EXHAUSTED);
>           excludeHosts.add(client.cassandraHost);
>         }
> I guess this is written for the timeout scenario above, so it's supposed to catch that.
> but getClientFromLBPolicy() reconstructs a general HectorException from the PoolExhaustedException given by borrowClient().
> this makes all pool grabbing timeout immediately pop up to client, which I guess is not the original intention.
> so I guess getClientFromLBPolicy() needs to throw directly the original Exception. so as to trigger the logic in the catch part.
> but after I made those changes, I found that I often get ActiveNum() from the pool to be negative, and TillExhausted to be higher than the quota. this does not make sense.
> this was because that every code path goes through the line "releaseClient()" in the  finally {} clause. so that on the pool grabbing , numActive.decrementAndGet() was already executed, and it also gets executed in the finally clause
> this end up creating many connections to the server, which bogs down the server , we have seen it creating huge cpu load

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira