You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@geode.apache.org by Rob Shepherd <rg...@gmail.com> on 2018/11/02 15:37:52 UTC
High (Max) CPU
Hi,
I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
On both I’m seeing maxed out CPUs.
I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch.
Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU) Samples
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
100.00% 2829377292 2829377292 8639
.org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
0.00% 0 0 8639
..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
0.00% 0 0 8263
...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
1.21% 9906300 9906300 8180
....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
100.00% 1144915695 1144915695 5
.....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
0.00% 0 0 5
How can I determine if this is a problem with my setup or if it is a bug?
A supposition: I notice that there are multiple instances of a thread named after my Async Event queue ID
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
Are there supposed to be 4? are they interfering with each other (wait/notify) on an empty queue?
Thanks for any insight
Rob
Re: High (Max) CPU
Posted by Rob Shepherd <rg...@gmail.com>.
Thank you for the detailed information on these values and for the use case
analysis.
As a newcomer to geode i’d Suggest copying the information in these
responses into the docs and changing the example timeout to something other
than 0
Thanks again
Rob
On Fri, 2 Nov 2018 at 18:32, Udo Kohlmeyer <ud...@apache.org> wrote:
> Hi there Rob,
>
> Great to see that you found one of the problems. So essentially you told
> the async-queue to try and send whatever it had in it's queue every 0ms...
> which means, it will just spin, wanting to send.
>
> `batch-time-interval` and `batch-size` are there to determine WHEN message
> batches get to sent. So, with a `batch-size=1` would also cause this queue
> to always fire, when there is 1 message in the queue. I usually treat the
> settings like, "send a batch of xxx messages when the limit is reached,
> otherwise wait yy-millis to send what is available"
>
> Usually WAN Replication and Async-Queue (which share a common hierarchy)
> are async, fault-tolerant, batch-oriented mechanisms. To use them at the
> granular level of every 0ms or 1 entry in the batch is a lot of overhead,
> in terms of effort to keep the primary and backup queues in sync.
>
> Also, keep in mind, sending every message you receive might not be
> beneficial either. (think financial rates). If one receives the data at a
> high frequency, it is always best to ask oneself, can the
> process/downstream system process and potentially respond to a change
> BEFORE the next rate is delivered. So in some cases it makes sense to even
> have batch-conflation turned on, to avoid sending messages that could
> essentially be ignored.
>
> But if the requirement is to send (and react to) every message, then these
> two parameters are something I would test with to find optimal send sizes
> and timeout. ALSO, take into account here, network send buffer sizes, can
> play a role on performance, so take that into account sizing batches and
> configuring buffer sizes.
>
> --Udo
>
> On 11/2/18 09:56, Rob Shepherd wrote:
>
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the
> following in my cache.xml:
>
> <async-event-queue
> id="expiry-event-queue"
> parallel="false"
> enable-batch-conflation="false"
> disk-synchronous="true"
> forward-expiration-destroy="true"
> —-> batch-time-interval=“0"
> batch-size="1"
> >...
>
> …Which is a copy from the example here:
> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
>
> create async-event-queue --id=example-async-event-queue \
> --parallel=false \
> --enable-batch-conflation=false \
> --batch-size=1 \
> --batch-time-interval=0 \
> --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
> the issue.
>
> I think I understand what this parameter is for and so a delay here would
> be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start server
> command hangs and I have to kill the new server process and the gfsh
> launcher.
>
> it is not important to me now, but i would like to evaluate this at some
> point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>
> Hi Rob,
>
> We will look into this, meanwhile could you please elaborate on what
> configuration is Apache Geode running, like how many servers, how many AEQs
> regions etc, what workload is it running.
>
> Thank you
> Nabarun Nag
>
> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
> Hi,
>
> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>
> On both I’m seeing maxed out CPUs.
>
> I’ve profiled it locally on a dormant server instance (no application
> activity) and the Async Queue routines are the highest contributor to CPU
> activity by a long stretch.
>
> <PastedGraphic-1.png>
>
> Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU)
> Samples
>
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>
> 100.00% 2829377292 2829377292 8639
>
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>
> 0.00% 0 0 8639
>
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>
> 0.00% 0 0 8263
>
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>
> 1.21% 9906300 9906300 8180
>
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>
> 100.00% 1144915695 1144915695 5
>
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>
> 0.00% 0 0 5
>
>
> How can I determine if this is a problem with my setup or if it is a bug?
>
> A supposition: I notice that there are multiple instances of a thread
> named after my Async Event queue ID
>
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>
> Are there supposed to be 4? are they interfering with each other
> (wait/notify) on an empty queue?
>
> Thanks for any insight
>
> Rob
>
>
>
>
> --
Rob Shepherd BEng PhD
Re: High (Max) CPU
Posted by Udo Kohlmeyer <ud...@apache.org>.
Hi there Rob,
Great to see that you found one of the problems. So essentially you told
the async-queue to try and send whatever it had in it's queue every
0ms... which means, it will just spin, wanting to send.
`batch-time-interval` and `batch-size` are there to determine WHEN
message batches get to sent. So, with a `batch-size=1` would also cause
this queue to always fire, when there is 1 message in the queue. I
usually treat the settings like, "send a batch of xxx messages when the
limit is reached, otherwise wait yy-millis to send what is available"
Usually WAN Replication and Async-Queue (which share a common hierarchy)
are async, fault-tolerant, batch-oriented mechanisms. To use them at the
granular level of every 0ms or 1 entry in the batch is a lot of
overhead, in terms of effort to keep the primary and backup queues in sync.
Also, keep in mind, sending every message you receive might not be
beneficial either. (think financial rates). If one receives the data at
a high frequency, it is always best to ask oneself, can the
process/downstream system process and potentially respond to a change
BEFORE the next rate is delivered. So in some cases it makes sense to
even have batch-conflation turned on, to avoid sending messages that
could essentially be ignored.
But if the requirement is to send (and react to) every message, then
these two parameters are something I would test with to find optimal
send sizes and timeout. ALSO, take into account here, network send
buffer sizes, can play a role on performance, so take that into account
sizing batches and configuring buffer sizes.
--Udo
On 11/2/18 09:56, Rob Shepherd wrote:
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the
> following in my cache.xml:
>
> <async-event-queue
> id="expiry-event-queue"
> parallel="false"
> enable-batch-conflation="false"
> disk-synchronous="true"
> forward-expiration-destroy="true"
> —-> batch-time-interval=“0"
> batch-size="1"
> >...
>
> …Which is a copy from the example here:
> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
> create async-event-queue --id=example-async-event-queue \
> --parallel=false \
> --enable-batch-conflation=false \
> --batch-size=1 \
> --batch-time-interval=0 \
> --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has
> fixed the issue.
>
> I think I understand what this parameter is for and so a delay here
> would be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start
> server command hangs and I have to kill the new server process and the
> gfsh launcher.
>
> it is not important to me now, but i would like to evaluate this at
> some point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
>> On 2 Nov 2018, at 16:22, Nabarun Nag <nnag@pivotal.io
>> <ma...@pivotal.io>> wrote:
>>
>> Hi Rob,
>>
>> We will look into this, meanwhile could you please elaborate on what
>> configuration is Apache Geode running, like how many servers, how
>> many AEQs regions etc, what workload is it running.
>>
>> Thank you
>> Nabarun Nag
>>
>>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rgshepherd@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>>>
>>> On both I’m seeing maxed out CPUs.
>>>
>>> I’ve profiled it locally on a dormant server instance (no
>>> application activity) and the Async Queue routines are the highest
>>> contributor to CPU activity by a long stretch.
>>>
>>> <PastedGraphic-1.png>
>>>
>>> Back Traces - Method Total Time [%] Total Time [µs] Total Time
>>> (CPU) Samples
>>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>> 100.00% 2829377292 2829377292 8639
>>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>> 0.00% 0 0 8639
>>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>> 0.00% 0 0 8263
>>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>> 1.21% 9906300 9906300 8180
>>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>> 100.00% 1144915695 1144915695 5
>>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>> 0.00% 0 0 5
>>>
>>>
>>>
>>> How can I determine if this is a problem with my setup or if it is a
>>> bug?
>>>
>>> A supposition: I notice that there are multiple instances of a
>>> thread named after my Async Event queue ID
>>>
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>>
>>> Are there supposed to be 4? are they interfering with each other
>>> (wait/notify) on an empty queue?
>>>
>>> Thanks for any insight
>>>
>>> Rob
>>
>
Re: High (Max) CPU
Posted by Rob Shepherd <rg...@gmail.com>.
> Only partitioned regions support parallel queues.
Thanks, yes my trivial example is not partitioned.
I might suggested some sort of option validation to detect this to prevent
the hanging at startup from what I guess is a side effect of
misconfiguration.
Thanks again
Rob
On Fri, 2 Nov 2018 at 18:23, Anthony Baker <ab...@pivotal.io> wrote:
> Also, it matters if your region is partitioned or not. Only partitioned
> regions support parallel queues.
>
> I’m not sure about the gfsh behavior you’re seeing when you set
> parallel=true.
>
> Anthony
>
>
> On Nov 2, 2018, at 11:12 AM, John Blum <jb...@pivotal.io> wrote:
>
> Hi Rob-
>
> Sorry, just noticed your follow up where you figured out the hot CPU issue
> caused by your batch-time-interval of 0.
>
> As for setting parallel to *true*, it may be you have other severs
> currently running that do not have parallel currently configured to *true*,
> which I think (don't remember for sure) might be considered incompatible
> Region configuration by the cluster. All Regions with the same name hosted
> by members in the cluster must have compatible configuration.
>
> Check whether you may already have existing members with the same Region
> (by name) that may not currently have the associated AEQ set with
> parallel=true.
>
> Cheers,
> -j
>
>
> On Fri, Nov 2, 2018 at 11:07 AM, John Blum <jb...@pivotal.io> wrote:
>
>> Hi Rob-
>>
>> Please see here [1]. Default is 5 Threads. Also see [2].
>>
>> Alternatively, look at [3] (<async-event-queue dispatcher-threads=".."
>> ..>) or [4] (create async-event-queue --dispatcher-threads).
>>
>> The threads should not be having an impact, but you can experiment with
>> the number to find out.
>>
>> -j
>>
>>
>> [1]
>> http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads--
>> [2]
>> http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int-
>> [3]
>> http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue
>> [4]
>> http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk
>>
>>
>> On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rg...@gmail.com>
>> wrote:
>>
>>> Thank you Nabarun,
>>>
>>> Having started to pull out my config to send to you, I noticed the
>>> following in my cache.xml:
>>>
>>> <async-event-queue
>>> id="expiry-event-queue"
>>> parallel="false"
>>> enable-batch-conflation="false"
>>> disk-synchronous="true"
>>> forward-expiration-destroy="true"
>>> —-> batch-time-interval=“0"
>>> batch-size="1"
>>> >...
>>>
>>> …Which is a copy from the example here:
>>> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
>>>
>>> create async-event-queue --id=example-async-event-queue \
>>> --parallel=false \
>>> --enable-batch-conflation=false \
>>> --batch-size=1 \
>>> --batch-time-interval=0 \
>>> --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>>>
>>>
>>>
>>> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
>>> the issue.
>>>
>>> I think I understand what this parameter is for and so a delay here
>>> would be tolerable.
>>>
>>>
>>>
>>> I have another question, if I set parallel=“true” the gfsh start server
>>> command hangs and I have to kill the new server process and the gfsh
>>> launcher.
>>>
>>> it is not important to me now, but i would like to evaluate this at some
>>> point and so i’ll happily try and debug the cause.
>>>
>>> Thanks
>>>
>>> Rob
>>>
>>>
>>>
>>>
>>> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>>>
>>> Hi Rob,
>>>
>>> We will look into this, meanwhile could you please elaborate on what
>>> configuration is Apache Geode running, like how many servers, how many AEQs
>>> regions etc, what workload is it running.
>>>
>>> Thank you
>>> Nabarun Nag
>>>
>>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>>>
>>> On both I’m seeing maxed out CPUs.
>>>
>>> I’ve profiled it locally on a dormant server instance (no application
>>> activity) and the Async Queue routines are the highest contributor to CPU
>>> activity by a long stretch.
>>>
>>> <PastedGraphic-1.png>
>>>
>>> Back Traces - MethodTotal Time [%]Total Time [µs]Total Time (CPU)Samples
>>>
>>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>>
>>> 100.00% 2829377292 2829377292 8639
>>>
>>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>>
>>> 0.00% 0 0 8639
>>>
>>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>>
>>> 0.00% 0 0 8263
>>>
>>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>>
>>> 1.21% 9906300 9906300 8180
>>>
>>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>>
>>> 100.00% 1144915695 1144915695 5
>>>
>>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>>
>>> 0.00% 0 0 5
>>>
>>>
>>> How can I determine if this is a problem with my setup or if it is a bug?
>>>
>>> A supposition: I notice that there are multiple instances of a thread
>>> named after my Async Event queue ID
>>>
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>>
>>> Are there supposed to be 4? are they interfering with each other
>>> (wait/notify) on an empty queue?
>>>
>>> Thanks for any insight
>>>
>>> Rob
>>>
>>>
>>>
>>>
>>
>>
>> --
>> -John
>> john.blum10101 (skype)
>>
>
>
>
> --
> -John
> john.blum10101 (skype)
>
>
> --
Rob Shepherd BEng PhD
Re: High (Max) CPU
Posted by Anthony Baker <ab...@pivotal.io>.
Also, it matters if your region is partitioned or not. Only partitioned regions support parallel queues.
I’m not sure about the gfsh behavior you’re seeing when you set parallel=true.
Anthony
> On Nov 2, 2018, at 11:12 AM, John Blum <jb...@pivotal.io> wrote:
>
> Hi Rob-
>
> Sorry, just noticed your follow up where you figured out the hot CPU issue caused by your batch-time-interval of 0.
>
> As for setting parallel to true, it may be you have other severs currently running that do not have parallel currently configured to true, which I think (don't remember for sure) might be considered incompatible Region configuration by the cluster. All Regions with the same name hosted by members in the cluster must have compatible configuration.
>
> Check whether you may already have existing members with the same Region (by name) that may not currently have the associated AEQ set with parallel=true.
>
> Cheers,
> -j
>
>
> On Fri, Nov 2, 2018 at 11:07 AM, John Blum <jblum@pivotal.io <ma...@pivotal.io>> wrote:
> Hi Rob-
>
> Please see here [1]. Default is 5 Threads. Also see [2].
>
> Alternatively, look at [3] (<async-event-queue dispatcher-threads=".." ..>) or [4] (create async-event-queue --dispatcher-threads).
>
> The threads should not be having an impact, but you can experiment with the number to find out.
>
> -j
>
>
> [1] http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads-- <http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads-->
> [2] http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int- <http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int->[3] http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue <http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue>
> [4] http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk <http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk>
>
>
> On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rgshepherd@gmail.com <ma...@gmail.com>> wrote:
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the following in my cache.xml:
>
> <async-event-queue
> id="expiry-event-queue"
> parallel="false"
> enable-batch-conflation="false"
> disk-synchronous="true"
> forward-expiration-destroy="true"
> —-> batch-time-interval=“0"
> batch-size="1"
> >...
>
> …Which is a copy from the example here: https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh <https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh>
> create async-event-queue --id=example-async-event-queue \
> --parallel=false \
> --enable-batch-conflation=false \
> --batch-size=1 \
> --batch-time-interval=0 \
> --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed the issue.
>
> I think I understand what this parameter is for and so a delay here would be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start server command hangs and I have to kill the new server process and the gfsh launcher.
>
> it is not important to me now, but i would like to evaluate this at some point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
>> On 2 Nov 2018, at 16:22, Nabarun Nag <nnag@pivotal.io <ma...@pivotal.io>> wrote:
>>
>> Hi Rob,
>>
>> We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.
>>
>> Thank you
>> Nabarun Nag
>>
>>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rgshepherd@gmail.com <ma...@gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>>>
>>> On both I’m seeing maxed out CPUs.
>>>
>>> I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch.
>>>
>>> <PastedGraphic-1.png>
>>>
>>> Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU) Samples
>>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>> 100.00% 2829377292 2829377292 8639
>>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>> 0.00% 0 0 8639
>>> ..org.apache.geode.internal.ca <http://org.apache.geode.internal.ca/>che.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>> 0.00% 0 0 8263
>>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>> 1.21% 9906300 9906300 8180
>>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>> 100.00% 1144915695 1144915695 5
>>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>> 0.00% 0 0 5
>>>
>>>
>>> How can I determine if this is a problem with my setup or if it is a bug?
>>>
>>> A supposition: I notice that there are multiple instances of a thread named after my Async Event queue ID
>>>
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>>
>>> Are there supposed to be 4? are they interfering with each other (wait/notify) on an empty queue?
>>>
>>> Thanks for any insight
>>>
>>> Rob
>>
>
>
>
>
> --
> -John
> john.blum10101 (skype)
>
>
>
> --
> -John
> john.blum10101 (skype)
Re: High (Max) CPU
Posted by John Blum <jb...@pivotal.io>.
Hi Rob-
Sorry, just noticed your follow up where you figured out the hot CPU issue
caused by your batch-time-interval of 0.
As for setting parallel to *true*, it may be you have other severs
currently running that do not have parallel currently configured to *true*,
which I think (don't remember for sure) might be considered incompatible
Region configuration by the cluster. All Regions with the same name hosted
by members in the cluster must have compatible configuration.
Check whether you may already have existing members with the same Region
(by name) that may not currently have the associated AEQ set with
parallel=true.
Cheers,
-j
On Fri, Nov 2, 2018 at 11:07 AM, John Blum <jb...@pivotal.io> wrote:
> Hi Rob-
>
> Please see here [1]. Default is 5 Threads. Also see [2].
>
> Alternatively, look at [3] (<async-event-queue dispatcher-threads=".."
> ..>) or [4] (create async-event-queue --dispatcher-threads).
>
> The threads should not be having an impact, but you can experiment with
> the number to find out.
>
> -j
>
>
> [1] http://geode.apache.org/releases/latest/javadoc/org/
> apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads--
> [2] http://geode.apache.org/releases/latest/javadoc/org/
> apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#
> setDispatcherThreads-int-
> [3] http://geode.apache.org/docs/guide/17/reference/
> topics/cache_xml.html#async-event-queue
> [4] http://geode.apache.org/docs/guide/17/tools_modules/
> gfsh/command-pages/create.html#topic_ryz_pb1_dk
>
>
> On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
>> Thank you Nabarun,
>>
>> Having started to pull out my config to send to you, I noticed the
>> following in my cache.xml:
>>
>> <async-event-queue
>> id="expiry-event-queue"
>> parallel="false"
>> enable-batch-conflation="false"
>> disk-synchronous="true"
>> forward-expiration-destroy="true"
>> —-> batch-time-interval=“0"
>> batch-size="1"
>> >...
>>
>> …Which is a copy from the example here: https://github.com/apache/geod
>> e-examples/blob/master/async/scripts/start.gfsh
>>
>> create async-event-queue --id=example-async-event-queue \
>> --parallel=false \
>> --enable-batch-conflation=false \
>> --batch-size=1 \
>> --batch-time-interval=0 \
>> --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>>
>>
>>
>> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
>> the issue.
>>
>> I think I understand what this parameter is for and so a delay here would
>> be tolerable.
>>
>>
>>
>> I have another question, if I set parallel=“true” the gfsh start server
>> command hangs and I have to kill the new server process and the gfsh
>> launcher.
>>
>> it is not important to me now, but i would like to evaluate this at some
>> point and so i’ll happily try and debug the cause.
>>
>> Thanks
>>
>> Rob
>>
>>
>>
>>
>> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>>
>> Hi Rob,
>>
>> We will look into this, meanwhile could you please elaborate on what
>> configuration is Apache Geode running, like how many servers, how many AEQs
>> regions etc, what workload is it running.
>>
>> Thank you
>> Nabarun Nag
>>
>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>>
>> Hi,
>>
>> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>>
>> On both I’m seeing maxed out CPUs.
>>
>> I’ve profiled it locally on a dormant server instance (no application
>> activity) and the Async Queue routines are the highest contributor to CPU
>> activity by a long stretch.
>>
>> <PastedGraphic-1.png>
>>
>> Back Traces - MethodTotal Time [%]Total Time [µs]Total Time (CPU)Samples
>>
>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>
>> 100.00% 2829377292 2829377292 8639
>>
>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>
>> 0.00% 0 0 8639
>>
>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>
>> 0.00% 0 0 8263
>>
>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>
>> 1.21% 9906300 9906300 8180
>>
>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>
>> 100.00% 1144915695 1144915695 5
>>
>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>
>> 0.00% 0 0 5
>>
>>
>> How can I determine if this is a problem with my setup or if it is a bug?
>>
>> A supposition: I notice that there are multiple instances of a thread
>> named after my Async Event queue ID
>>
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>
>> Are there supposed to be 4? are they interfering with each other
>> (wait/notify) on an empty queue?
>>
>> Thanks for any insight
>>
>> Rob
>>
>>
>>
>>
>
>
> --
> -John
> john.blum10101 (skype)
>
--
-John
john.blum10101 (skype)
Re: High (Max) CPU
Posted by John Blum <jb...@pivotal.io>.
Hi Rob-
Please see here [1]. Default is 5 Threads. Also see [2].
Alternatively, look at [3] (<async-event-queue dispatcher-threads=".." ..>)
or [4] (create async-event-queue --dispatcher-threads).
The threads should not be having an impact, but you can experiment with the
number to find out.
-j
[1]
http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads--
[2]
http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int-
[3]
http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue
[4]
http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk
On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rg...@gmail.com> wrote:
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the
> following in my cache.xml:
>
> <async-event-queue
> id="expiry-event-queue"
> parallel="false"
> enable-batch-conflation="false"
> disk-synchronous="true"
> forward-expiration-destroy="true"
> —-> batch-time-interval=“0"
> batch-size="1"
> >...
>
> …Which is a copy from the example here: https://github.com/apache/
> geode-examples/blob/master/async/scripts/start.gfsh
>
> create async-event-queue --id=example-async-event-queue \
> --parallel=false \
> --enable-batch-conflation=false \
> --batch-size=1 \
> --batch-time-interval=0 \
> --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
> the issue.
>
> I think I understand what this parameter is for and so a delay here would
> be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start server
> command hangs and I have to kill the new server process and the gfsh
> launcher.
>
> it is not important to me now, but i would like to evaluate this at some
> point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>
> Hi Rob,
>
> We will look into this, meanwhile could you please elaborate on what
> configuration is Apache Geode running, like how many servers, how many AEQs
> regions etc, what workload is it running.
>
> Thank you
> Nabarun Nag
>
> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
> Hi,
>
> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>
> On both I’m seeing maxed out CPUs.
>
> I’ve profiled it locally on a dormant server instance (no application
> activity) and the Async Queue routines are the highest contributor to CPU
> activity by a long stretch.
>
> <PastedGraphic-1.png>
>
> Back Traces - MethodTotal Time [%]Total Time [µs]Total Time (CPU)Samples
>
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>
> 100.00% 2829377292 2829377292 8639
>
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>
> 0.00% 0 0 8639
>
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>
> 0.00% 0 0 8263
>
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>
> 1.21% 9906300 9906300 8180
>
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>
> 100.00% 1144915695 1144915695 5
>
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>
> 0.00% 0 0 5
>
>
> How can I determine if this is a problem with my setup or if it is a bug?
>
> A supposition: I notice that there are multiple instances of a thread
> named after my Async Event queue ID
>
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>
> Are there supposed to be 4? are they interfering with each other
> (wait/notify) on an empty queue?
>
> Thanks for any insight
>
> Rob
>
>
>
>
--
-John
john.blum10101 (skype)
Re: High (Max) CPU
Posted by Rob Shepherd <rg...@gmail.com>.
Thank you Nabarun,
Having started to pull out my config to send to you, I noticed the following in my cache.xml:
<async-event-queue
id="expiry-event-queue"
parallel="false"
enable-batch-conflation="false"
disk-synchronous="true"
forward-expiration-destroy="true"
—-> batch-time-interval=“0"
batch-size="1"
>...
…Which is a copy from the example here: https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
create async-event-queue --id=example-async-event-queue \
--parallel=false \
--enable-batch-conflation=false \
--batch-size=1 \
--batch-time-interval=0 \
--listener=org.apache.geode_examples.async.ExampleAsyncEventListener
I pondered on “batch-time-interval” and set it to 1000 and it has fixed the issue.
I think I understand what this parameter is for and so a delay here would be tolerable.
I have another question, if I set parallel=“true” the gfsh start server command hangs and I have to kill the new server process and the gfsh launcher.
it is not important to me now, but i would like to evaluate this at some point and so i’ll happily try and debug the cause.
Thanks
Rob
> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>
> Hi Rob,
>
> We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.
>
> Thank you
> Nabarun Nag
>
>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rgshepherd@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi,
>>
>> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>>
>> On both I’m seeing maxed out CPUs.
>>
>> I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch.
>>
>> <PastedGraphic-1.png>
>>
>> Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU) Samples
>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>> 100.00% 2829377292 2829377292 8639
>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>> 0.00% 0 0 8639
>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>> 0.00% 0 0 8263
>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>> 1.21% 9906300 9906300 8180
>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>> 100.00% 1144915695 1144915695 5
>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>> 0.00% 0 0 5
>>
>>
>> How can I determine if this is a problem with my setup or if it is a bug?
>>
>> A supposition: I notice that there are multiple instances of a thread named after my Async Event queue ID
>>
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>
>> Are there supposed to be 4? are they interfering with each other (wait/notify) on an empty queue?
>>
>> Thanks for any insight
>>
>> Rob
>
Re: High (Max) CPU
Posted by Nabarun Nag <nn...@pivotal.io>.
Hi Rob,
We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.
Thank you
Nabarun Nag
> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
> Hi,
>
> I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
>
> On both I’m seeing maxed out CPUs.
>
> I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch.
>
> <PastedGraphic-1.png>
>
> Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU) Samples
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
> 100.00% 2829377292 2829377292 8639
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
> 0.00% 0 0 8639
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
> 0.00% 0 0 8263
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
> 1.21% 9906300 9906300 8180
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
> 100.00% 1144915695 1144915695 5
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
> 0.00% 0 0 5
>
>
> How can I determine if this is a problem with my setup or if it is a bug?
>
> A supposition: I notice that there are multiple instances of a thread named after my Async Event queue ID
>
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>
> Are there supposed to be 4? are they interfering with each other (wait/notify) on an empty queue?
>
> Thanks for any insight
>
> Rob