You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/02/12 08:51:57 UTC

[jira] Resolved: (CASSANDRA-2157) Hector concurrentHClient pool gives out more connections than its quota

     [ https://issues.apache.org/jira/browse/CASSANDRA-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood resolved CASSANDRA-2157.
---------------------------------

    Resolution: Invalid

This is an awesome bug report, but the Cassandra project itself does not maintain Hector: you should probably re-file this bug with the developers on Github: https://github.com/rantav/hector

> Hector concurrentHClient pool gives out more connections than its quota
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-2157
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2157
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Yang Yang
>
> Hector ConcurrentHClient.java can give up on connection pool grabbing, in line 85 (following all refer to latest 0.7.0 head)
>      } else {
>         try {
>           cassandraClient = availableClientQueue.poll(maxWaitTimeWhenExhausted, TimeUnit.MILLISECONDS);
>           if ( cassandraClient == null ) {
>             numBlocked.decrementAndGet();
>             throw new PoolExhaustedException(String.format("maxWaitTimeWhenExhausted exceeded for thread %s on host %s",
>                 new Object[]{
>                 Thread.currentThread().getName(),
>                 cassandraHost.getName()}
>             ));
>           }
>         } catch (InterruptedException ie) {
>           //monitor.incCounter(Counter.POOL_EXHAUSTED);
>           numActive.decrementAndGet();
>         }
> so if we specify a maxwaittime, it could give up and **** do a numActive.decrementAndGet().
> but in the HConnectionManager.java
>   public void operateWithFailover(Operation<?> op) throws HectorException {
> in the main loop of this method,  
>         client =  getClientFromLBPolicy(excludeHosts);
> could throw Exception.
>   in the catch part,  there is a clause for 
>         } else if ( he instanceof PoolExhaustedException ) {
>           retryable = true;
>           --retries;
>           if ( hostPools.size() == 1 ) {
>             throw he;
>           }
>           monitor.incCounter(Counter.POOL_EXHAUSTED);
>           excludeHosts.add(client.cassandraHost);
>         }
> I guess this is written for the timeout scenario above, so it's supposed to catch that.
> but getClientFromLBPolicy() reconstructs a general HectorException from the PoolExhaustedException given by borrowClient().
> this makes all pool grabbing timeout immediately pop up to client, which I guess is not the original intention.
> so I guess getClientFromLBPolicy() needs to throw directly the original Exception. so as to trigger the logic in the catch part.
> but after I made those changes, I found that I often get ActiveNum() from the pool to be negative, and TillExhausted to be higher than the quota. this does not make sense.
> this was because that every code path goes through the line "releaseClient()" in the  finally {} clause. so that on the pool grabbing , numActive.decrementAndGet() was already executed, and it also gets executed in the finally clause
> this end up creating many connections to the server, which bogs down the server , we have seen it creating huge cpu load

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira