You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Gregory Chase (JIRA)" <ji...@apache.org> on 2017/09/27 18:47:00 UTC
[jira] [Created] (GEODE-3709) Geode Version: 1.1.1 In one of the project we a...

Gregory Chase created GEODE-3709:
------------------------------------

             Summary: Geode Version: 1.1.1    In one of the project we a...
                 Key: GEODE-3709
                 URL: https://issues.apache.org/jira/browse/GEODE-3709
             Project: Geode
          Issue Type: Improvement
            Reporter: Gregory Chase


Geode Version: 1.1.1

In one of the project we are using Geode. Here is a summary of how we use it.
- Geode servers have multiple regions. 
- Clients subscribe to the data from these regions.
- Clients subscribe interest in all the entries, therefore they get updates about all the entries from creation to modification to deletion.
- One of the regions usually has 5-10 million entries with a TTL of 24 hours. Most entries are added in an hour's span one after other. So when TTL kicks in, they are often destroyed in an hour.

Problem:
Every now and then we observe following message: 
	Client queue for _gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue client is full.
This seems to happen when the TTL kicks in on the region with 5-10 million entries. Entries start getting evicted (deleted); the updates (destroys) now must be sent to clients. We see that the updates do happen for a while but suddenly the updates stop and the queue size starts growing. This is becoming a major issue for smooth functioning of our production setup. Any help will be much appreciated. 

I did some ground work by downloading and looking at the code. I see reference to 2 issues #37581, #51400. But I am unable to view actual JIRA tickets (needs login credentials) Hopefully, it helps someone looking at the issue.
Here is the pertinent code:

   @Override
    @edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT")
    void checkQueueSizeConstraint() throws InterruptedException {
      if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix for bug 39413
        if (Thread.interrupted())
          throw new InterruptedException();
        synchronized (this.putGuard) {
          if (putPermits <= 0) {
            synchronized (this.permitMon) {
              if (reconcilePutPermits() <= 0) {
                if (region.getSystem().getConfig().getRemoveUnresponsiveClient()) {
                  isClientSlowReciever = true;
                } else {
                  try {
                    long logFrequency = CacheClientNotifier.DEFAULT_LOG_FREQUENCY;
                    CacheClientNotifier ccn = CacheClientNotifier.getInstance();
                    if (ccn != null) { // check needed for junit tests
                      logFrequency = ccn.getLogFrequency();
                    }
                    if ((this.maxQueueSizeHitCount % logFrequency) == 0) {
                      logger.warn(LocalizedMessage.create(
                          LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL,
                          new Object[] {region.getName()}));
                      this.maxQueueSizeHitCount = 0;
                    }
                    ++this.maxQueueSizeHitCount;
                    this.region.checkReadiness(); // fix for bug 37581
                    // TODO: wait called while holding two locks
                    this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime);
                    this.region.checkReadiness(); // fix for bug 37581
                    // Fix for #51400. Allow the queue to grow beyond its
                    // capacity/maxQueueSize, if it is taking a long time to
                    // drain the queue, either due to a slower client or the
                    // deadlock scenario mentioned in the ticket.
                    reconcilePutPermits();
                    if ((this.maxQueueSizeHitCount % logFrequency) == 1) {
                      logger.info(LocalizedMessage
                          .create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS));
                    }
                  } catch (InterruptedException ex) {
                    // TODO: The line below is meaningless. Comment it out later
                    this.permitMon.notifyAll();
                    throw ex;
                  }
                }
              }
            } // synchronized (this.permitMon)
          } // if (putPermits <= 0)
          --putPermits;
        } // synchronized (this.putGuard)
      }
    }


*Reporter*: Mangesh Deshmukh
*E-mail*: [mailto:mdeshmukh@quotient.com]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)