You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@geode.apache.org by Rob Shepherd <rg...@gmail.com> on 2018/11/02 15:37:52 UTC

High (Max) CPU

Hi,

I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)

On both I’m seeing maxed out CPUs.

I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch. 



Back Traces - Method	Total Time [%]	Total Time [µs]	Total Time (CPU)	Samples
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
100.00%	2829377292	2829377292	8639
.org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
0.00%	0	0	8639
..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
0.00%	0	0	8263
...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
1.21%	9906300	9906300	8180
....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
100.00%	1144915695	1144915695	5
.....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
0.00%	0	0	5


How can I determine if this is a problem with my setup or if it is a bug?

A supposition:  I notice that there are multiple instances of a thread named after my Async Event queue ID

Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4

Are there supposed to be 4?  are they interfering with each other (wait/notify) on an empty queue? 

Thanks for any insight

Rob

Re: High (Max) CPU

Posted by Rob Shepherd <rg...@gmail.com>.

Thank you for the detailed information on these values and for the use case
analysis.

As a newcomer to geode i’d Suggest copying the information in these
responses into the docs and changing the example timeout to something other
than 0

Thanks again

Rob



On Fri, 2 Nov 2018 at 18:32, Udo Kohlmeyer <ud...@apache.org> wrote:

> Hi there Rob,
>
> Great to see that you found one of the problems. So essentially you told
> the async-queue to try and send whatever it had in it's queue every 0ms...
> which means, it will just spin, wanting to send.
>
> `batch-time-interval` and `batch-size` are there to determine WHEN message
> batches get to sent. So, with a `batch-size=1` would also cause this queue
> to always fire, when there is 1 message in the queue.  I usually treat the
> settings like, "send a batch of xxx messages when the limit is reached,
> otherwise wait yy-millis to send what is available"
>
> Usually WAN Replication and Async-Queue (which share a common hierarchy)
> are async, fault-tolerant, batch-oriented mechanisms. To use them at the
> granular level of every 0ms or 1 entry in the batch is a lot of overhead,
> in terms of effort to keep the primary and backup queues in sync.
>
> Also, keep in mind, sending every message you receive might not be
> beneficial either. (think financial rates). If one receives the data at a
> high frequency, it is always best to ask oneself, can the
> process/downstream system process and potentially respond to a change
> BEFORE the next rate is delivered. So in some cases it makes sense to even
> have batch-conflation turned on, to avoid sending messages that could
> essentially be ignored.
>
> But if the requirement is to send (and react to) every message, then these
> two parameters are something I would test with to find optimal send sizes
> and timeout. ALSO, take into account here, network send buffer sizes, can
> play a role on performance, so take that into account sizing batches and
> configuring buffer sizes.
>
> --Udo
>
> On 11/2/18 09:56, Rob Shepherd wrote:
>
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the
> following in my cache.xml:
>
> <async-event-queue
>     id="expiry-event-queue"
>     parallel="false"
>     enable-batch-conflation="false"
>     disk-synchronous="true"
>     forward-expiration-destroy="true"
>         —-> batch-time-interval=“0"
>     batch-size="1"
>     >...
>
> …Which is a copy from the example here:
> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
>
> create async-event-queue --id=example-async-event-queue \
>   --parallel=false \
>   --enable-batch-conflation=false \
>   --batch-size=1 \
>   --batch-time-interval=0 \
>   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
> the issue.
>
> I think I understand what this parameter is for and so a delay here would
> be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start server
> command hangs and I have to kill the new server process and the gfsh
> launcher.
>
> it is not important to me now, but i would like to evaluate this at some
> point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>
> Hi Rob,
>
> We will look into this, meanwhile could you please elaborate on what
> configuration is Apache Geode running, like how many servers, how many AEQs
> regions etc, what workload is it running.
>
> Thank you
> Nabarun Nag
>
> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
> Hi,
>
> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>
> On both I’m seeing maxed out CPUs.
>
> I’ve profiled it locally on a dormant server instance (no application
> activity) and the Async Queue routines are the highest contributor to CPU
> activity by a long stretch.
>
> <PastedGraphic-1.png>
>
> Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU)
> Samples
>
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>
> 100.00% 2829377292 2829377292 8639
>
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>
> 0.00% 0 0 8639
>
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>
> 0.00% 0 0 8263
>
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>
> 1.21% 9906300 9906300 8180
>
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>
> 100.00% 1144915695 1144915695 5
>
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>
> 0.00% 0 0 5
>
>
> How can I determine if this is a problem with my setup or if it is a bug?
>
> A supposition:  I notice that there are multiple instances of a thread
> named after my Async Event queue ID
>
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>
> Are there supposed to be 4?  are they interfering with each other
> (wait/notify) on an empty queue?
>
> Thanks for any insight
>
> Rob
>
>
>
>
> --
Rob Shepherd BEng PhD

Re: High (Max) CPU

Posted by Udo Kohlmeyer <ud...@apache.org>.

Hi there Rob,

Great to see that you found one of the problems. So essentially you told 
the async-queue to try and send whatever it had in it's queue every 
0ms... which means, it will just spin, wanting to send.

`batch-time-interval` and `batch-size` are there to determine WHEN 
message batches get to sent. So, with a `batch-size=1` would also cause 
this queue to always fire, when there is 1 message in the queue.  I 
usually treat the settings like, "send a batch of xxx messages when the 
limit is reached, otherwise wait yy-millis to send what is available"

Usually WAN Replication and Async-Queue (which share a common hierarchy) 
are async, fault-tolerant, batch-oriented mechanisms. To use them at the 
granular level of every 0ms or 1 entry in the batch is a lot of 
overhead, in terms of effort to keep the primary and backup queues in sync.

Also, keep in mind, sending every message you receive might not be 
beneficial either. (think financial rates). If one receives the data at 
a high frequency, it is always best to ask oneself, can the 
process/downstream system process and potentially respond to a change 
BEFORE the next rate is delivered. So in some cases it makes sense to 
even have batch-conflation turned on, to avoid sending messages that 
could essentially be ignored.

But if the requirement is to send (and react to) every message, then 
these two parameters are something I would test with to find optimal 
send sizes and timeout. ALSO, take into account here, network send 
buffer sizes, can play a role on performance, so take that into account 
sizing batches and configuring buffer sizes.

--Udo

On 11/2/18 09:56, Rob Shepherd wrote:
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the 
> following in my cache.xml:
>
> <async-event-queue
> id="expiry-event-queue"
> parallel="false"
> enable-batch-conflation="false"
> disk-synchronous="true"
> forward-expiration-destroy="true"
>   —-> batch-time-interval=“0"
> batch-size="1"
> >...
>
> …Which is a copy from the example here: 
> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
> create async-event-queue --id=example-async-event-queue \
>    --parallel=false \
>    --enable-batch-conflation=false \
>    --batch-size=1 \
>    --batch-time-interval=0 \
>    --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has 
> fixed the issue.
>
> I think I understand what this parameter is for and so a delay here 
> would be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start 
> server command hangs and I have to kill the new server process and the 
> gfsh launcher.
>
> it is not important to me now, but i would like to evaluate this at 
> some point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
>> On 2 Nov 2018, at 16:22, Nabarun Nag <nnag@pivotal.io 
>> <ma...@pivotal.io>> wrote:
>>
>> Hi Rob,
>>
>> We will look into this, meanwhile could you please elaborate on what 
>> configuration is Apache Geode running, like how many servers, how 
>> many AEQs regions etc, what workload is it running.
>>
>> Thank you
>> Nabarun Nag
>>
>>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rgshepherd@gmail.com 
>>> <ma...@gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>>>
>>> On both I’m seeing maxed out CPUs.
>>>
>>> I’ve profiled it locally on a dormant server instance (no 
>>> application activity) and the Async Queue routines are the highest 
>>> contributor to CPU activity by a long stretch.
>>>
>>> <PastedGraphic-1.png>
>>>
>>> Back Traces - Method 	Total Time [%] 	Total Time [µs] 	Total Time 
>>> (CPU) 	Samples
>>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>> 	100.00% 	2829377292 	2829377292 	8639
>>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>> 	0.00% 	0 	0 	8639
>>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>> 	0.00% 	0 	0 	8263
>>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>> 	1.21% 	9906300 	9906300 	8180
>>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>> 	100.00% 	1144915695 	1144915695 	5
>>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>> 	0.00% 	0 	0 	5
>>>
>>>
>>>
>>> How can I determine if this is a problem with my setup or if it is a 
>>> bug?
>>>
>>> A supposition:  I notice that there are multiple instances of a 
>>> thread named after my Async Event queue ID
>>>
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>>
>>> Are there supposed to be 4?  are they interfering with each other 
>>> (wait/notify) on an empty queue?
>>>
>>> Thanks for any insight
>>>
>>> Rob
>>
>

Re: High (Max) CPU

Posted by Rob Shepherd <rg...@gmail.com>.

> Only partitioned regions support parallel queues.

Thanks, yes my trivial example is not partitioned.

I might suggested some sort of option validation to detect this to prevent
the hanging at startup from what I guess is a side effect of
misconfiguration.

Thanks again

Rob

On Fri, 2 Nov 2018 at 18:23, Anthony Baker <ab...@pivotal.io> wrote:

> Also, it matters if your region is partitioned or not.  Only partitioned
> regions support parallel queues.
>
> I’m not sure about the gfsh behavior you’re seeing when you set
> parallel=true.
>
> Anthony
>
>
> On Nov 2, 2018, at 11:12 AM, John Blum <jb...@pivotal.io> wrote:
>
> Hi Rob-
>
> Sorry, just noticed your follow up where you figured out the hot CPU issue
> caused by your batch-time-interval of 0.
>
> As for setting parallel to *true*, it may be you have other severs
> currently running that do not have parallel currently configured to *true*,
> which I think (don't remember for sure) might be considered incompatible
> Region configuration by the cluster.  All Regions with the same name hosted
> by members in the cluster must have compatible configuration.
>
> Check whether you may already have existing members with the same Region
> (by name) that may not currently have the associated AEQ set with
> parallel=true.
>
> Cheers,
> -j
>
>
> On Fri, Nov 2, 2018 at 11:07 AM, John Blum <jb...@pivotal.io> wrote:
>
>> Hi Rob-
>>
>> Please see here [1].  Default is 5 Threads.  Also see [2].
>>
>> Alternatively, look at  [3] (<async-event-queue dispatcher-threads=".."
>> ..>) or [4] (create async-event-queue --dispatcher-threads).
>>
>> The threads should not be having an impact, but you can experiment with
>> the number to find out.
>>
>> -j
>>
>>
>> [1]
>> http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads--
>> [2]
>> http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int-
>> [3]
>> http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue
>> [4]
>> http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk
>>
>>
>> On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rg...@gmail.com>
>> wrote:
>>
>>> Thank you Nabarun,
>>>
>>> Having started to pull out my config to send to you, I noticed the
>>> following in my cache.xml:
>>>
>>> <async-event-queue
>>>     id="expiry-event-queue"
>>>     parallel="false"
>>>     enable-batch-conflation="false"
>>>     disk-synchronous="true"
>>>     forward-expiration-destroy="true"
>>>         —-> batch-time-interval=“0"
>>>     batch-size="1"
>>>     >...
>>>
>>> …Which is a copy from the example here:
>>> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
>>>
>>> create async-event-queue --id=example-async-event-queue \
>>>   --parallel=false \
>>>   --enable-batch-conflation=false \
>>>   --batch-size=1 \
>>>   --batch-time-interval=0 \
>>>   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>>>
>>>
>>>
>>> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
>>> the issue.
>>>
>>> I think I understand what this parameter is for and so a delay here
>>> would be tolerable.
>>>
>>>
>>>
>>> I have another question, if I set parallel=“true” the gfsh start server
>>> command hangs and I have to kill the new server process and the gfsh
>>> launcher.
>>>
>>> it is not important to me now, but i would like to evaluate this at some
>>> point and so i’ll happily try and debug the cause.
>>>
>>> Thanks
>>>
>>> Rob
>>>
>>>
>>>
>>>
>>> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>>>
>>> Hi Rob,
>>>
>>> We will look into this, meanwhile could you please elaborate on what
>>> configuration is Apache Geode running, like how many servers, how many AEQs
>>> regions etc, what workload is it running.
>>>
>>> Thank you
>>> Nabarun Nag
>>>
>>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>>>
>>> On both I’m seeing maxed out CPUs.
>>>
>>> I’ve profiled it locally on a dormant server instance (no application
>>> activity) and the Async Queue routines are the highest contributor to CPU
>>> activity by a long stretch.
>>>
>>> <PastedGraphic-1.png>
>>>
>>> Back Traces - MethodTotal Time [%]Total Time [µs]Total Time (CPU)Samples
>>>
>>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>>
>>> 100.00% 2829377292 2829377292 8639
>>>
>>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>>
>>> 0.00% 0 0 8639
>>>
>>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>>
>>> 0.00% 0 0 8263
>>>
>>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>>
>>> 1.21% 9906300 9906300 8180
>>>
>>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>>
>>> 100.00% 1144915695 1144915695 5
>>>
>>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>>
>>> 0.00% 0 0 5
>>>
>>>
>>> How can I determine if this is a problem with my setup or if it is a bug?
>>>
>>> A supposition:  I notice that there are multiple instances of a thread
>>> named after my Async Event queue ID
>>>
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>>
>>> Are there supposed to be 4?  are they interfering with each other
>>> (wait/notify) on an empty queue?
>>>
>>> Thanks for any insight
>>>
>>> Rob
>>>
>>>
>>>
>>>
>>
>>
>> --
>> -John
>> john.blum10101 (skype)
>>
>
>
>
> --
> -John
> john.blum10101 (skype)
>
>
> --
Rob Shepherd BEng PhD

Re: High (Max) CPU

Posted by Anthony Baker <ab...@pivotal.io>.

Also, it matters if your region is partitioned or not.  Only partitioned regions support parallel queues.

I’m not sure about the gfsh behavior you’re seeing when you set parallel=true.

Anthony


> On Nov 2, 2018, at 11:12 AM, John Blum <jb...@pivotal.io> wrote:
> 
> Hi Rob-
> 
> Sorry, just noticed your follow up where you figured out the hot CPU issue caused by your batch-time-interval of 0.
> 
> As for setting parallel to true, it may be you have other severs currently running that do not have parallel currently configured to true, which I think (don't remember for sure) might be considered incompatible Region configuration by the cluster.  All Regions with the same name hosted by members in the cluster must have compatible configuration.
> 
> Check whether you may already have existing members with the same Region (by name) that may not currently have the associated AEQ set with parallel=true.
> 
> Cheers,
> -j
> 
> 
> On Fri, Nov 2, 2018 at 11:07 AM, John Blum <jblum@pivotal.io <ma...@pivotal.io>> wrote:
> Hi Rob-
> 
> Please see here [1].  Default is 5 Threads.  Also see [2].
> 
> Alternatively, look at  [3] (<async-event-queue dispatcher-threads=".." ..>) or [4] (create async-event-queue --dispatcher-threads).
> 
> The threads should not be having an impact, but you can experiment with the number to find out.
> 
> -j
> 
> 
> [1] http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads-- <http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads-->
> [2] http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int- <http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int->[3] http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue <http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue>
> [4] http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk <http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk>
> 
> 
> On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rgshepherd@gmail.com <ma...@gmail.com>> wrote:
> Thank you Nabarun,
> 
> Having started to pull out my config to send to you, I noticed the following in my cache.xml:
> 
> <async-event-queue 
> 	    id="expiry-event-queue"
> 	    parallel="false"
> 	    enable-batch-conflation="false"
> 	    disk-synchronous="true"
> 	    forward-expiration-destroy="true" 
>         —-> batch-time-interval=“0"
> 	    batch-size="1"
> 	    >...
> 
> …Which is a copy from the example here: https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh <https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh>
> create async-event-queue --id=example-async-event-queue \
>   --parallel=false \
>   --enable-batch-conflation=false \
>   --batch-size=1 \
>   --batch-time-interval=0 \
>   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
> 
> 
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed the issue.
> 
> I think I understand what this parameter is for and so a delay here would be tolerable.
> 
> 
> 
> I have another question, if I set parallel=“true” the gfsh start server command hangs and I have to kill the new server process and the gfsh launcher. 
> 
> it is not important to me now, but i would like to evaluate this at some point and so i’ll happily try and debug the cause.
> 
> Thanks
> 
> Rob
> 
> 
> 
> 
>> On 2 Nov 2018, at 16:22, Nabarun Nag <nnag@pivotal.io <ma...@pivotal.io>> wrote:
>> 
>> Hi Rob, 
>> 
>> We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.
>> 
>> Thank you
>> Nabarun Nag
>> 
>>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rgshepherd@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>>> 
>>> On both I’m seeing maxed out CPUs.
>>> 
>>> I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch. 
>>> 
>>> <PastedGraphic-1.png>
>>> 
>>> Back Traces - Method	Total Time [%]	Total Time [µs]	Total Time (CPU)	Samples
>>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>> 100.00%	2829377292	2829377292	8639
>>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>> 0.00%	0	0	8639
>>> ..org.apache.geode.internal.ca <http://org.apache.geode.internal.ca/>che.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>> 0.00%	0	0	8263
>>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>> 1.21%	9906300	9906300	8180
>>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>> 100.00%	1144915695	1144915695	5
>>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>> 0.00%	0	0	5
>>> 
>>> 
>>> How can I determine if this is a problem with my setup or if it is a bug?
>>> 
>>> A supposition:  I notice that there are multiple instances of a thread named after my Async Event queue ID
>>> 
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>> 
>>> Are there supposed to be 4?  are they interfering with each other (wait/notify) on an empty queue? 
>>> 
>>> Thanks for any insight
>>> 
>>> Rob
>> 
> 
> 
> 
> 
> -- 
> -John
> john.blum10101 (skype)
> 
> 
> 
> -- 
> -John
> john.blum10101 (skype)

Re: High (Max) CPU

Posted by John Blum <jb...@pivotal.io>.

Hi Rob-

Sorry, just noticed your follow up where you figured out the hot CPU issue
caused by your batch-time-interval of 0.

As for setting parallel to *true*, it may be you have other severs
currently running that do not have parallel currently configured to *true*,
which I think (don't remember for sure) might be considered incompatible
Region configuration by the cluster.  All Regions with the same name hosted
by members in the cluster must have compatible configuration.

Check whether you may already have existing members with the same Region
(by name) that may not currently have the associated AEQ set with
parallel=true.

Cheers,
-j


On Fri, Nov 2, 2018 at 11:07 AM, John Blum <jb...@pivotal.io> wrote:

> Hi Rob-
>
> Please see here [1].  Default is 5 Threads.  Also see [2].
>
> Alternatively, look at  [3] (<async-event-queue dispatcher-threads=".."
> ..>) or [4] (create async-event-queue --dispatcher-threads).
>
> The threads should not be having an impact, but you can experiment with
> the number to find out.
>
> -j
>
>
> [1] http://geode.apache.org/releases/latest/javadoc/org/
> apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads--
> [2] http://geode.apache.org/releases/latest/javadoc/org/
> apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#
> setDispatcherThreads-int-
> [3] http://geode.apache.org/docs/guide/17/reference/
> topics/cache_xml.html#async-event-queue
> [4] http://geode.apache.org/docs/guide/17/tools_modules/
> gfsh/command-pages/create.html#topic_ryz_pb1_dk
>
>
> On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
>> Thank you Nabarun,
>>
>> Having started to pull out my config to send to you, I noticed the
>> following in my cache.xml:
>>
>> <async-event-queue
>>     id="expiry-event-queue"
>>     parallel="false"
>>     enable-batch-conflation="false"
>>     disk-synchronous="true"
>>     forward-expiration-destroy="true"
>>         —-> batch-time-interval=“0"
>>     batch-size="1"
>>     >...
>>
>> …Which is a copy from the example here: https://github.com/apache/geod
>> e-examples/blob/master/async/scripts/start.gfsh
>>
>> create async-event-queue --id=example-async-event-queue \
>>   --parallel=false \
>>   --enable-batch-conflation=false \
>>   --batch-size=1 \
>>   --batch-time-interval=0 \
>>   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>>
>>
>>
>> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
>> the issue.
>>
>> I think I understand what this parameter is for and so a delay here would
>> be tolerable.
>>
>>
>>
>> I have another question, if I set parallel=“true” the gfsh start server
>> command hangs and I have to kill the new server process and the gfsh
>> launcher.
>>
>> it is not important to me now, but i would like to evaluate this at some
>> point and so i’ll happily try and debug the cause.
>>
>> Thanks
>>
>> Rob
>>
>>
>>
>>
>> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>>
>> Hi Rob,
>>
>> We will look into this, meanwhile could you please elaborate on what
>> configuration is Apache Geode running, like how many servers, how many AEQs
>> regions etc, what workload is it running.
>>
>> Thank you
>> Nabarun Nag
>>
>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>>
>> Hi,
>>
>> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>>
>> On both I’m seeing maxed out CPUs.
>>
>> I’ve profiled it locally on a dormant server instance (no application
>> activity) and the Async Queue routines are the highest contributor to CPU
>> activity by a long stretch.
>>
>> <PastedGraphic-1.png>
>>
>> Back Traces - MethodTotal Time [%]Total Time [µs]Total Time (CPU)Samples
>>
>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>>
>> 100.00% 2829377292 2829377292 8639
>>
>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>>
>> 0.00% 0 0 8639
>>
>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>>
>> 0.00% 0 0 8263
>>
>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>>
>> 1.21% 9906300 9906300 8180
>>
>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>>
>> 100.00% 1144915695 1144915695 5
>>
>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>>
>> 0.00% 0 0 5
>>
>>
>> How can I determine if this is a problem with my setup or if it is a bug?
>>
>> A supposition:  I notice that there are multiple instances of a thread
>> named after my Async Event queue ID
>>
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>>
>> Are there supposed to be 4?  are they interfering with each other
>> (wait/notify) on an empty queue?
>>
>> Thanks for any insight
>>
>> Rob
>>
>>
>>
>>
>
>
> --
> -John
> john.blum10101 (skype)
>



-- 
-John
john.blum10101 (skype)

Re: High (Max) CPU

Posted by John Blum <jb...@pivotal.io>.

Hi Rob-

Please see here [1].  Default is 5 Threads.  Also see [2].

Alternatively, look at  [3] (<async-event-queue dispatcher-threads=".." ..>)
or [4] (create async-event-queue --dispatcher-threads).

The threads should not be having an impact, but you can experiment with the
number to find out.

-j


[1]
http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueue.html#getDispatcherThreads--
[2]
http://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/asyncqueue/AsyncEventQueueFactory.html#setDispatcherThreads-int-
[3]
http://geode.apache.org/docs/guide/17/reference/topics/cache_xml.html#async-event-queue
[4]
http://geode.apache.org/docs/guide/17/tools_modules/gfsh/command-pages/create.html#topic_ryz_pb1_dk


On Fri, Nov 2, 2018 at 9:56 AM, Rob Shepherd <rg...@gmail.com> wrote:

> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the
> following in my cache.xml:
>
> <async-event-queue
>     id="expiry-event-queue"
>     parallel="false"
>     enable-batch-conflation="false"
>     disk-synchronous="true"
>     forward-expiration-destroy="true"
>         —-> batch-time-interval=“0"
>     batch-size="1"
>     >...
>
> …Which is a copy from the example here: https://github.com/apache/
> geode-examples/blob/master/async/scripts/start.gfsh
>
> create async-event-queue --id=example-async-event-queue \
>   --parallel=false \
>   --enable-batch-conflation=false \
>   --batch-size=1 \
>   --batch-time-interval=0 \
>   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
> the issue.
>
> I think I understand what this parameter is for and so a delay here would
> be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start server
> command hangs and I have to kill the new server process and the gfsh
> launcher.
>
> it is not important to me now, but i would like to evaluate this at some
> point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
>
> Hi Rob,
>
> We will look into this, meanwhile could you please elaborate on what
> configuration is Apache Geode running, like how many servers, how many AEQs
> regions etc, what workload is it running.
>
> Thank you
> Nabarun Nag
>
> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
>
> Hi,
>
> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>
> On both I’m seeing maxed out CPUs.
>
> I’ve profiled it locally on a dormant server instance (no application
> activity) and the Async Queue routines are the highest contributor to CPU
> activity by a long stretch.
>
> <PastedGraphic-1.png>
>
> Back Traces - MethodTotal Time [%]Total Time [µs]Total Time (CPU)Samples
>
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>
> 100.00% 2829377292 2829377292 8639
>
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>
> 0.00% 0 0 8639
>
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>
> 0.00% 0 0 8263
>
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>
> 1.21% 9906300 9906300 8180
>
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>
> 100.00% 1144915695 1144915695 5
>
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>
> 0.00% 0 0 5
>
>
> How can I determine if this is a problem with my setup or if it is a bug?
>
> A supposition:  I notice that there are multiple instances of a thread
> named after my Async Event queue ID
>
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>
> Are there supposed to be 4?  are they interfering with each other
> (wait/notify) on an empty queue?
>
> Thanks for any insight
>
> Rob
>
>
>
>


-- 
-John
john.blum10101 (skype)

Re: High (Max) CPU

Posted by Rob Shepherd <rg...@gmail.com>.

Thank you Nabarun,

Having started to pull out my config to send to you, I noticed the following in my cache.xml:

<async-event-queue 
	    id="expiry-event-queue"
	    parallel="false"
	    enable-batch-conflation="false"
	    disk-synchronous="true"
	    forward-expiration-destroy="true" 
        —-> batch-time-interval=“0"
	    batch-size="1"
	    >...

…Which is a copy from the example here: https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
create async-event-queue --id=example-async-event-queue \
  --parallel=false \
  --enable-batch-conflation=false \
  --batch-size=1 \
  --batch-time-interval=0 \
  --listener=org.apache.geode_examples.async.ExampleAsyncEventListener


I pondered on “batch-time-interval” and set it to 1000 and it has fixed the issue.

I think I understand what this parameter is for and so a delay here would be tolerable.



I have another question, if I set parallel=“true” the gfsh start server command hangs and I have to kill the new server process and the gfsh launcher. 

it is not important to me now, but i would like to evaluate this at some point and so i’ll happily try and debug the cause.

Thanks

Rob




> On 2 Nov 2018, at 16:22, Nabarun Nag <nn...@pivotal.io> wrote:
> 
> Hi Rob, 
> 
> We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.
> 
> Thank you
> Nabarun Nag
> 
>> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rgshepherd@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>> 
>> On both I’m seeing maxed out CPUs.
>> 
>> I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch. 
>> 
>> <PastedGraphic-1.png>
>> 
>> Back Traces - Method	Total Time [%]	Total Time [µs]	Total Time (CPU)	Samples
>> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>> 100.00%	2829377292	2829377292	8639
>> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>> 0.00%	0	0	8639
>> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>> 0.00%	0	0	8263
>> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>> 1.21%	9906300	9906300	8180
>> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>> 100.00%	1144915695	1144915695	5
>> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>> 0.00%	0	0	5
>> 
>> 
>> How can I determine if this is a problem with my setup or if it is a bug?
>> 
>> A supposition:  I notice that there are multiple instances of a thread named after my Async Event queue ID
>> 
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
>> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>> 
>> Are there supposed to be 4?  are they interfering with each other (wait/notify) on an empty queue? 
>> 
>> Thanks for any insight
>> 
>> Rob
>

Re: High (Max) CPU

Posted by Nabarun Nag <nn...@pivotal.io>.

Hi Rob, 

We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.

Thank you
Nabarun Nag

> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <rg...@gmail.com> wrote:
> 
> Hi,
> 
> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
> 
> On both I’m seeing maxed out CPUs.
> 
> I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch. 
> 
> <PastedGraphic-1.png>
> 
> Back Traces - Method	Total Time [%]	Total Time [µs]	Total Time (CPU)	Samples
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
> 100.00%	2829377292	2829377292	8639
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
> 0.00%	0	0	8639
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
> 0.00%	0	0	8263
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
> 1.21%	9906300	9906300	8180
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
> 100.00%	1144915695	1144915695	5
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
> 0.00%	0	0	5
> 
> 
> How can I determine if this is a problem with my setup or if it is a bug?
> 
> A supposition:  I notice that there are multiple instances of a thread named after my Async Event queue ID
> 
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
> 
> Are there supposed to be 4?  are they interfering with each other (wait/notify) on an empty queue? 
> 
> Thanks for any insight
> 
> Rob