You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by jim_b_o <ja...@inaseq.com> on 2016/02/04 17:29:25 UTC

Artemis use of PhantomReference impacting GC performance

I have observed a significant degradation in GC performance after upgrading
from JBoss 4.2.3.GA to WildFly 10.0.0.Final.  I'm 99% sure the difference is
due to Artemis ActiveMQ making heavy use of PhantomReference objects
(directly or indirectly).  I'm raising this discussion to see if anyone else
can observe the same behaviour and determine whether to raise an issue
against Artemis or one of its dependencies.

What's changed?


I have JBoss 4.2.3.GA (with jboss-messaging-1.4.5.GA) and WildFly
10.0.0.Final running on identical hardware with identical JVMs (8u71) and GC
settings (using G1GC).  The application is only minimally changed as
required by the API differences between the old and new servers.  The
application load is identical and results in near identical memory
allocation rate and young GC frequency.

*The difference is that on JBoss each young GC takes around 10-12ms whereas
on WildFly they take 16-20ms.*  The additional time is taken in the Object
Copy and Ref Proc phases.  This indicates that there are more Survivor
objects on each GC and more References being processed.  I'm pretty sure the
additional Survivor objects are also the Reference objects.  Running a Java
Mission Control (JMC) Flight Recording on these two systems under identical
production load, WildFly is showing around 11,000 PhantomReferences
processed each GC versus a handful on JBoss.  The number of
PhantomReferences seems to approximately correspond to the number of JMS
messages transmitted between GCs.

Some evidence


In a separate test environment I have WildFly running under a standard test
load.  It is emitting JMS messages at a consistent rate (lower than
production) onto a non-durable Topic.  The following image shows a 2 minute
JMC recording when there is no subscriber on the Topic.  We can see that
there are very few PhantomReferences and the GCs take around 4ms.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithoutConsumer.png> 

However once I start up an external JMS client subscribed to the Topic the
PhantomReferences appear and the GCs slow down as shown in the next image. 
The messaging is non-XA with auto-ack.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithConsumer.png> 

Where do the PhantomReferences come from?


I used the debugger to capture where the PhantomReferences are being
created.  This seems to indicate they are generated by Artemis and/or the
associated XNIO remoting...

Thread [Thread-1
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
(Suspended (breakpoint at line 80 in PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	NettyServerConnection.createTransportBuffer(int) line: 38	

RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
line: 162	
	SessionReceiveMessage.encode(RemotingConnection) line: 59	
	ChannelImpl.send(Packet, boolean, boolean) line: 225	
	ChannelImpl.sendBatched(Packet) line: 205	
	CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int) line:
84	
	ServerConsumerImpl.deliverStandardMessage(MessageReference, ServerMessage)
line: 883	
	ServerConsumerImpl.proceedDeliver(MessageReference) line: 366	
	QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358	
	QueueImpl.deliver() line: 1873	
	QueueImpl.access$1400(QueueImpl) line: 97	
	QueueImpl$DeliverRunner.run() line: 2581	
	OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100	
	ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142	
	ThreadPoolExecutor$Worker.run() line: 617	
	Thread.run() line: 745	
Thread [default I/O-5] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559	
Thread [default I/O-4] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559	
Thread [default I/O-2] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559
	
What's next?


So it's seems pretty certain that this is where the change is coming from. 
The total GC load is still low but we're aiming for consistent low latency
and this is a backwards step.

I'd be interested in any comments or questions.  I can provide more info if
required.

Thanks.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

That's great! Thank you for the update.  

And thanks for your highly technical discussion and details through this thread.  It helped a lot. 

> On Feb 9, 2016, at 5:38 AM, jim_b_o <ja...@inaseq.com> wrote:
> 
> Soak testing confirms consistent good GC behaviour.  Thanks for getting this
> done.
> 
> 
> 
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707281.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

Soak testing confirms consistent good GC behaviour.  Thanks for getting this
done.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707281.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

Looks like you've nailed it.

I checked out the /wildfly-011/ tag from a clone of
/rh-messaging/jboss-activemq-artemis/, applied your patch and updated the
/artemis-server/ jar in my test deployment.  The GC behaviour now looks more
like what I expect and the PhantomReferences have been eliminated.

As a final check I'll update my Prod-like environment and soak test for a
while.  I'll get back to you with those results.





--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707111.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

I will appreciate some feedback after you try either the fix or the
regular connection factory.

Let me know if you need help applying the fix (getting the right versions, etc)

On Sat, Feb 6, 2016 at 3:36 AM, jim_b_o <ja...@inaseq.com> wrote:
> Great.  I can easily validate a fix if it's an updated artemis jar that I can
> drop into the WildFly modules directories.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707090.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

Great.  I can easily validate a fix if it's an updated artemis jar that I can
drop into the WildFly modules directories.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707090.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

https://issues.apache.org/jira/browse/ARTEMIS-392


Will you be able to validate a fix?

you can always use the regular artemis way (no http upgrade)

On Fri, Feb 5, 2016 at 3:34 PM, Clebert Suconic
<cl...@gmail.com> wrote:
> just confirmed.. something weird happens when you use the HTTP upgrade path.
>
> I'm not sure the issue is on Artemis or on the Application server
> integration yet.
>
>
> this should work as expected if you didn't use the http connector.
>
> On Fri, Feb 5, 2016 at 11:45 AM, Clebert Suconic
> <cl...@gmail.com> wrote:
>> Yep.
>>
>> That's the one.
>>
>>
>> Let me know. And I will keep working this afternoon.
>>
>> Sent from my iPhone
>>
>>> On Feb 5, 2016, at 11:06 AM, jim_b_o <ja...@inaseq.com> wrote:
>>>
>>> Thanks.  I think the 'direct' connection you're talking about must be the
>>> /netty-connector/ (instead of the /http-connector/) mentioned in a couple of
>>> places in the WildFly docs.  I'll certainly give that a go if it might avoid
>>> this issue.
>>>
>>>
>>>
>>> --
>>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707063.html
>>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

just confirmed.. something weird happens when you use the HTTP upgrade path.

I'm not sure the issue is on Artemis or on the Application server
integration yet.


this should work as expected if you didn't use the http connector.

On Fri, Feb 5, 2016 at 11:45 AM, Clebert Suconic
<cl...@gmail.com> wrote:
> Yep.
>
> That's the one.
>
>
> Let me know. And I will keep working this afternoon.
>
> Sent from my iPhone
>
>> On Feb 5, 2016, at 11:06 AM, jim_b_o <ja...@inaseq.com> wrote:
>>
>> Thanks.  I think the 'direct' connection you're talking about must be the
>> /netty-connector/ (instead of the /http-connector/) mentioned in a couple of
>> places in the WildFly docs.  I'll certainly give that a go if it might avoid
>> this issue.
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707063.html
>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

Yep. 

That's the one.  


Let me know. And I will keep working this afternoon. 

Sent from my iPhone

> On Feb 5, 2016, at 11:06 AM, jim_b_o <ja...@inaseq.com> wrote:
> 
> Thanks.  I think the 'direct' connection you're talking about must be the
> /netty-connector/ (instead of the /http-connector/) mentioned in a couple of
> places in the WildFly docs.  I'll certainly give that a go if it might avoid
> this issue.
> 
> 
> 
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707063.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

Thanks.  I think the 'direct' connection you're talking about must be the
/netty-connector/ (instead of the /http-connector/) mentioned in a couple of
places in the WildFly docs.  I'll certainly give that a go if it might avoid
this issue.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707063.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

I will investigate it on the afternoon.


Regarding the direct connection, that would be a question for wildfly
guys. Can you ask there? (point the discussion here if you prefer with
a link)

On Fri, Feb 5, 2016 at 10:19 AM, jim_b_o <ja...@inaseq.com> wrote:
> Yes, it will be the HTTP upgrade.  The client is simply doing a JNDI lookup
> for the connection factory via WildFly http-remoting.
>
> I'm not familiar with the 'direct connections' approach but am happy to read
> on if you can point me at something.  Although the high performance / low
> latency I need is with the rest of my app, not the JMS in particular
> (although I do appreciate it's good performance!).
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707061.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

Yes, it will be the HTTP upgrade.  The client is simply doing a JNDI lookup
for the connection factory via WildFly http-remoting.

I'm not familiar with the 'direct connections' approach but am happy to read
on if you can point me at something.  Although the high performance / low
latency I need is with the rest of my app, not the JMS in particular
(although I do appreciate it's good performance!).



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707061.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

... and I did more tests since my last email.. and I can't find
anywhere on our code where we use UnpooledByteBuffer through that
path. So, this is perhaps through the HTTP upgrade.


So, I will need some information from you on how you're creating the
connections. (I would suggest you using direct connections anyway,
since you need high performance and low latency).


Once you give me some update I will resume my investigation.



And thanks for being so through on this thread.. it helped a lot.

On Fri, Feb 5, 2016 at 9:40 AM, Clebert Suconic
<cl...@gmail.com> wrote:
> I'm a bit confused, because I'm debugging this on master, and at the
> version you are using,
>
>
> And the Allocator there would been PartialPooledByteBufAllocator,
> which will use POOLED on all DirectBuffers. I even ran a couple of
> tests that will send as Netty and the buffer there was always the
> pooled one.
>
>
> Which port are you using for this? Wildfly has one thing that will
> delegate through the HTTP server, upgrade the socket and defer it to
> Artemis as an embedded component. Perhaps there's an issue on that
> integration? on which I'm debugging to find where this is breaking.
>
>
>
>
> On Fri, Feb 5, 2016 at 7:51 AM, jim_b_o <ja...@inaseq.com> wrote:
>> Thanks to you both for your suggestions and questions.
>>
>> Answers to questions
>>
>>
>> Regarding VM parameters and GC load, I'm confident I have this under
>> control.  I have a lot of experience with GC tuning.  I've been monitoring
>> this specific behaviour for a couple of months now in preparation for moving
>> WildFly into production.  Clebert - you may recall I'm the one who spotted
>> the memory leak in ARTEMIS-320 by observing changed GC behaviour and went to
>> the effort to identify it down to the exact line of code.  So you're right
>> to enquire about this but I know this is not a problem with GC settings.
>>
>> Regarding my testing, I do burn-in for all tests and the JBoss versus
>> WildFly comparisons are on identical hardware using the same JDK version, GC
>> settings and load pattern (and as noted earlier are really only exercising
>> the JMS portions of the container).  I'm not trying to argue that JBoss
>> messaging is better than Artemis but I do believe there is currently an
>> issue of some sort that is worth investigating.
>>
>> Regarding my specific use case and the possibility of making a small
>> reproducer I can provide that if you confirm you still need it after reading
>> the next section.
>>
>> Progress
>>
>>
>> As promised I ran some tests again whilst isolating the JMS and EJB clients.
>> This confirmed for certain that the issue is in the JMS/Netty not the
>> EJB/XNIO.
>>
>> So the high number of PhantomReferences are being allocated in this stack:
>>
>> Thread [Thread-1
>> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
>> (Suspended (breakpoint at line 80 in PhantomReference))
>>         Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>)
>> line: 80
>>         Cleaner.<init>(Object, Runnable) line: 115
>>         Cleaner.create(Object, Runnable) line: 133
>>         DirectByteBuffer.<init>(int) line: 139
>>         ByteBuffer.allocateDirect(int) line: 311
>>         UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108
>>         UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line:
>> 69
>>         UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50
>>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int,
>> int) line: 155
>>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int)
>> line: 146
>>         NettyServerConnection.createTransportBuffer(int) line: 38
>>
>> RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
>> line: 162
>>         SessionReceiveMessage.encode(RemotingConnection) line: 59
>>         ChannelImpl.send(Packet, boolean, boolean) line: 225
>>         ChannelImpl.sendBatched(Packet) line: 205
>>         CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int)
>> line: 84
>>         ServerConsumerImpl.deliverStandardMessage(MessageReference,
>> ServerMessage) line: 883
>>         ServerConsumerImpl.proceedDeliver(MessageReference) line: 366
>>         QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358
>>         QueueImpl.deliver() line: 1873
>>         QueueImpl.access$1400(QueueImpl) line: 97
>>         QueueImpl$DeliverRunner.run() line: 2581
>>         OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100
>>         ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142
>>         ThreadPoolExecutor$Worker.run() line: 617
>>         Thread.run() line: 745
>>
>> Every java.nio.DirectByteBuffer that is allocated is creating a
>> PhantomReference.
>>
>> Clebert - In your earlier responses you refer to "netty for instance would
>> still be bootstrapping the reusable pool area" and "This could be Netty
>> bootstrapping the PooledByteBuffer arena", however the stack above suggests
>> that these ByteBuffers are not being pooled.  *Note the
>> UnpooledByteBufAllocator*.  This appears to be the default case for Netty
>> 4.0 (http://netty.io/wiki/new-and-noteworthy-in-4.0.html#wiki-h2-3) but is
>> changed to default to pooled in 4.1
>> (http://netty.io/wiki/new-and-noteworthy-in-4.1.html#wiki-h3-7).  So I think
>> this is our underlying problem.  DirectByteBuffers are only recommended for
>> long-lived scenarios (per java.nio.ByteBuffer JavaDocs) which is not the
>> case if they are being used for only one message then discarded.  It looks
>> like the NettyServerConnection should either be using HeapByteBuffers or we
>> should upgrade to Netty 4.1 so that the DirectByteBuffers get pooled and
>> reused.
>>
>> Is there a way to set the Netty ALLOCATOR ChannelOption to
>> PooledByteBufAllocator via the command line so that I could test it with the
>> existing 4.0?
>>
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707054.html
>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

I think you're looking at the client rather than the server.  The problem is
on the server side http acceptor (server in container publishing on Topic to
remote standalone consumer).  The relevant part is the
Netty*Server*Connection.createTransportBuffer.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707060.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

I'm a bit confused, because I'm debugging this on master, and at the
version you are using,


And the Allocator there would been PartialPooledByteBufAllocator,
which will use POOLED on all DirectBuffers. I even ran a couple of
tests that will send as Netty and the buffer there was always the
pooled one.


Which port are you using for this? Wildfly has one thing that will
delegate through the HTTP server, upgrade the socket and defer it to
Artemis as an embedded component. Perhaps there's an issue on that
integration? on which I'm debugging to find where this is breaking.




On Fri, Feb 5, 2016 at 7:51 AM, jim_b_o <ja...@inaseq.com> wrote:
> Thanks to you both for your suggestions and questions.
>
> Answers to questions
>
>
> Regarding VM parameters and GC load, I'm confident I have this under
> control.  I have a lot of experience with GC tuning.  I've been monitoring
> this specific behaviour for a couple of months now in preparation for moving
> WildFly into production.  Clebert - you may recall I'm the one who spotted
> the memory leak in ARTEMIS-320 by observing changed GC behaviour and went to
> the effort to identify it down to the exact line of code.  So you're right
> to enquire about this but I know this is not a problem with GC settings.
>
> Regarding my testing, I do burn-in for all tests and the JBoss versus
> WildFly comparisons are on identical hardware using the same JDK version, GC
> settings and load pattern (and as noted earlier are really only exercising
> the JMS portions of the container).  I'm not trying to argue that JBoss
> messaging is better than Artemis but I do believe there is currently an
> issue of some sort that is worth investigating.
>
> Regarding my specific use case and the possibility of making a small
> reproducer I can provide that if you confirm you still need it after reading
> the next section.
>
> Progress
>
>
> As promised I ran some tests again whilst isolating the JMS and EJB clients.
> This confirmed for certain that the issue is in the JMS/Netty not the
> EJB/XNIO.
>
> So the high number of PhantomReferences are being allocated in this stack:
>
> Thread [Thread-1
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
> (Suspended (breakpoint at line 80 in PhantomReference))
>         Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>)
> line: 80
>         Cleaner.<init>(Object, Runnable) line: 115
>         Cleaner.create(Object, Runnable) line: 133
>         DirectByteBuffer.<init>(int) line: 139
>         ByteBuffer.allocateDirect(int) line: 311
>         UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108
>         UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line:
> 69
>         UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int,
> int) line: 155
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int)
> line: 146
>         NettyServerConnection.createTransportBuffer(int) line: 38
>
> RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
> line: 162
>         SessionReceiveMessage.encode(RemotingConnection) line: 59
>         ChannelImpl.send(Packet, boolean, boolean) line: 225
>         ChannelImpl.sendBatched(Packet) line: 205
>         CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int)
> line: 84
>         ServerConsumerImpl.deliverStandardMessage(MessageReference,
> ServerMessage) line: 883
>         ServerConsumerImpl.proceedDeliver(MessageReference) line: 366
>         QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358
>         QueueImpl.deliver() line: 1873
>         QueueImpl.access$1400(QueueImpl) line: 97
>         QueueImpl$DeliverRunner.run() line: 2581
>         OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100
>         ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142
>         ThreadPoolExecutor$Worker.run() line: 617
>         Thread.run() line: 745
>
> Every java.nio.DirectByteBuffer that is allocated is creating a
> PhantomReference.
>
> Clebert - In your earlier responses you refer to "netty for instance would
> still be bootstrapping the reusable pool area" and "This could be Netty
> bootstrapping the PooledByteBuffer arena", however the stack above suggests
> that these ByteBuffers are not being pooled.  *Note the
> UnpooledByteBufAllocator*.  This appears to be the default case for Netty
> 4.0 (http://netty.io/wiki/new-and-noteworthy-in-4.0.html#wiki-h2-3) but is
> changed to default to pooled in 4.1
> (http://netty.io/wiki/new-and-noteworthy-in-4.1.html#wiki-h3-7).  So I think
> this is our underlying problem.  DirectByteBuffers are only recommended for
> long-lived scenarios (per java.nio.ByteBuffer JavaDocs) which is not the
> case if they are being used for only one message then discarded.  It looks
> like the NettyServerConnection should either be using HeapByteBuffers or we
> should upgrade to Netty 4.1 so that the DirectByteBuffers get pooled and
> reused.
>
> Is there a way to set the Netty ALLOCATOR ChannelOption to
> PooledByteBufAllocator via the command line so that I could test it with the
> existing 4.0?
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707054.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

Thanks to you both for your suggestions and questions.

Answers to questions


Regarding VM parameters and GC load, I'm confident I have this under
control.  I have a lot of experience with GC tuning.  I've been monitoring
this specific behaviour for a couple of months now in preparation for moving
WildFly into production.  Clebert - you may recall I'm the one who spotted
the memory leak in ARTEMIS-320 by observing changed GC behaviour and went to
the effort to identify it down to the exact line of code.  So you're right
to enquire about this but I know this is not a problem with GC settings.

Regarding my testing, I do burn-in for all tests and the JBoss versus
WildFly comparisons are on identical hardware using the same JDK version, GC
settings and load pattern (and as noted earlier are really only exercising
the JMS portions of the container).  I'm not trying to argue that JBoss
messaging is better than Artemis but I do believe there is currently an
issue of some sort that is worth investigating.

Regarding my specific use case and the possibility of making a small
reproducer I can provide that if you confirm you still need it after reading
the next section.

Progress


As promised I ran some tests again whilst isolating the JMS and EJB clients. 
This confirmed for certain that the issue is in the JMS/Netty not the
EJB/XNIO.

So the high number of PhantomReferences are being allocated in this stack:

Thread [Thread-1
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
(Suspended (breakpoint at line 80 in PhantomReference))	
        Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>)
line: 80	
        Cleaner.<init>(Object, Runnable) line: 115	
        Cleaner.create(Object, Runnable) line: 133	
        DirectByteBuffer.<init>(int) line: 139	
        ByteBuffer.allocateDirect(int) line: 311	
        UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
        UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line:
69	
        UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int,
int) line: 155	
        UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int)
line: 146	
        NettyServerConnection.createTransportBuffer(int) line: 38	
       
RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
line: 162	
        SessionReceiveMessage.encode(RemotingConnection) line: 59	
        ChannelImpl.send(Packet, boolean, boolean) line: 225	
        ChannelImpl.sendBatched(Packet) line: 205	
        CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int)
line: 84	
        ServerConsumerImpl.deliverStandardMessage(MessageReference,
ServerMessage) line: 883	
        ServerConsumerImpl.proceedDeliver(MessageReference) line: 366	
        QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358	
        QueueImpl.deliver() line: 1873	
        QueueImpl.access$1400(QueueImpl) line: 97	
        QueueImpl$DeliverRunner.run() line: 2581	
        OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100	
        ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142	
        ThreadPoolExecutor$Worker.run() line: 617	
        Thread.run() line: 745	

Every java.nio.DirectByteBuffer that is allocated is creating a
PhantomReference.

Clebert - In your earlier responses you refer to "netty for instance would
still be bootstrapping the reusable pool area" and "This could be Netty
bootstrapping the PooledByteBuffer arena", however the stack above suggests
that these ByteBuffers are not being pooled.  *Note the
UnpooledByteBufAllocator*.  This appears to be the default case for Netty
4.0 (http://netty.io/wiki/new-and-noteworthy-in-4.0.html#wiki-h2-3) but is
changed to default to pooled in 4.1
(http://netty.io/wiki/new-and-noteworthy-in-4.1.html#wiki-h3-7).  So I think
this is our underlying problem.  DirectByteBuffers are only recommended for
long-lived scenarios (per java.nio.ByteBuffer JavaDocs) which is not the
case if they are being used for only one message then discarded.  It looks
like the NettyServerConnection should either be using HeapByteBuffers or we
should upgrade to Netty 4.1 so that the DirectByteBuffers get pooled and
reused.

Is there a way to set the Netty ALLOCATOR ChannelOption to
PooledByteBufAllocator via the command line so that I could test it with the
existing 4.0?




--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4707054.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

I talked to someone, and we all raised the same question: When are you
measuring this?


By your earlier comment I had the impression you are measuring

"However *once I start up* an external JMS client subscribed to the
Topic the PhantomReferences appear and the GCs slow down as shown in
the next image.  The messaging is non-XA with auto-ack."




This could be Netty bootstrapping the PooledByteBuffer arena. We would
need to see this during a steady measurement. Can you provide some
more information?

On Thu, Feb 4, 2016 at 4:14 PM, Clebert Suconic
<cl...@gmail.com> wrote:
> Are you measuring this during a sustained load period? Measuring
> things like this during a ramp up will be likely cause false positives
> (as netty for instance would still be bootstrapping the reusable pool
> area).
>
>
> I know someone with a lot of experience on turning parallel GC, and
> I'm gathering some information for this thread. (I will ask him to
> post here).
>
>
> Also: you were with JBoss 4. You probably also have different JDKs
> now. so perhaps different settings would be required for your load?
>
> On Thu, Feb 4, 2016 at 2:44 PM, Justin Bertram <jb...@apache.com> wrote:
>> Thanks for explaining your use-case a bit more.  That helps.  In my opinion the key statement here is that Artemis isn't working in a vacuum - it actually plays quite a small role in a much larger application.  Your overall use-case involves low message volume and low latency requirements.
>>
>> Can you elaborate any more on what specifically the JMS related code is doing?  For example, are you using transactions?  What connection factory are you using?  Are your clients local or remote or both?  How many clients do you have and what kind are they (e.g. producer or consumer)?  The more information you can provide the more likely it is we will be able to help.
>>
>> Personally I think comparing Artemis to JBoss Messaging is not terribly useful.  We're talking about implementation level details here as they are related to GC.  We're not going to move back to JBoss Messaging's architecture and we're not going to drop Netty so let's focus on things we might actually be able to change.  It may be that your client code can be optimized or written differently to take advantage of how Artemis is implemented or it may be that you need to tune the GC differently.  We can't help much without more data about your use-case, application, environment, etc.  Performance related issues like this are typically the most difficult to work through, especially on a mailing list.  If you could provide an actual test demonstrating a problem that would be ideal.
>>
>>
>> Justin
>>
>> ----- Original Message -----
>> From: "jim_b_o" <ja...@inaseq.com>
>> To: dev@activemq.apache.org
>> Sent: Thursday, February 4, 2016 12:17:35 PM
>> Subject: Re: Artemis use of PhantomReference impacting GC performance
>>
>> I think the memory comparison is fair in this case.  The main application
>> load is events received via a ResourceAdapter passed to POJOs which process
>> and return responses via the same ResourceAdapter.  There is no Web
>> component and EJBs are only for admin functions and not being invoked during
>> these tests.  The JMS is an outbound feed of a small subset of the processed
>> events.  So the only container code under test is the JMS and a small part
>> of the ResourceAdapter glue although most of that is done using Spring and
>> is the same in both containers.  The POJOs and ResourceAdapter are identical
>> in both systems.
>>
>> This does impact application performance.  I don't doubt that Artemis can
>> handle more messages and do so faster than the old messaging but it is not
>> operating in a vacuum.  The JMS performance is not a critical aspect of this
>> system.  The critical aspect is how quickly and reliably the POJOs can
>> receive/process/respond via the ResourceAdapter.  These longer
>> stop-the-world GCs that appear to be due to the JMS cause hick-ups in that
>> processing.  The JMS is doubling the GC pause time.
>>
>> I'll run another test to confirm if the excess PhantomReferences are from
>> Netty or XNIO.  I'm currently connecting the JMS and EJB client
>> simultaneously so I'll separate them.
>>
>>
>>
>> --
>> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706976.html
>> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

Are you measuring this during a sustained load period? Measuring
things like this during a ramp up will be likely cause false positives
(as netty for instance would still be bootstrapping the reusable pool
area).


I know someone with a lot of experience on turning parallel GC, and
I'm gathering some information for this thread. (I will ask him to
post here).


Also: you were with JBoss 4. You probably also have different JDKs
now. so perhaps different settings would be required for your load?

On Thu, Feb 4, 2016 at 2:44 PM, Justin Bertram <jb...@apache.com> wrote:
> Thanks for explaining your use-case a bit more.  That helps.  In my opinion the key statement here is that Artemis isn't working in a vacuum - it actually plays quite a small role in a much larger application.  Your overall use-case involves low message volume and low latency requirements.
>
> Can you elaborate any more on what specifically the JMS related code is doing?  For example, are you using transactions?  What connection factory are you using?  Are your clients local or remote or both?  How many clients do you have and what kind are they (e.g. producer or consumer)?  The more information you can provide the more likely it is we will be able to help.
>
> Personally I think comparing Artemis to JBoss Messaging is not terribly useful.  We're talking about implementation level details here as they are related to GC.  We're not going to move back to JBoss Messaging's architecture and we're not going to drop Netty so let's focus on things we might actually be able to change.  It may be that your client code can be optimized or written differently to take advantage of how Artemis is implemented or it may be that you need to tune the GC differently.  We can't help much without more data about your use-case, application, environment, etc.  Performance related issues like this are typically the most difficult to work through, especially on a mailing list.  If you could provide an actual test demonstrating a problem that would be ideal.
>
>
> Justin
>
> ----- Original Message -----
> From: "jim_b_o" <ja...@inaseq.com>
> To: dev@activemq.apache.org
> Sent: Thursday, February 4, 2016 12:17:35 PM
> Subject: Re: Artemis use of PhantomReference impacting GC performance
>
> I think the memory comparison is fair in this case.  The main application
> load is events received via a ResourceAdapter passed to POJOs which process
> and return responses via the same ResourceAdapter.  There is no Web
> component and EJBs are only for admin functions and not being invoked during
> these tests.  The JMS is an outbound feed of a small subset of the processed
> events.  So the only container code under test is the JMS and a small part
> of the ResourceAdapter glue although most of that is done using Spring and
> is the same in both containers.  The POJOs and ResourceAdapter are identical
> in both systems.
>
> This does impact application performance.  I don't doubt that Artemis can
> handle more messages and do so faster than the old messaging but it is not
> operating in a vacuum.  The JMS performance is not a critical aspect of this
> system.  The critical aspect is how quickly and reliably the POJOs can
> receive/process/respond via the ResourceAdapter.  These longer
> stop-the-world GCs that appear to be due to the JMS cause hick-ups in that
> processing.  The JMS is doubling the GC pause time.
>
> I'll run another test to confirm if the excess PhantomReferences are from
> Netty or XNIO.  I'm currently connecting the JMS and EJB client
> simultaneously so I'll separate them.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706976.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.



-- 
Clebert Suconic

Re: Artemis use of PhantomReference impacting GC performance

Posted by Justin Bertram <jb...@apache.com>.

Thanks for explaining your use-case a bit more.  That helps.  In my opinion the key statement here is that Artemis isn't working in a vacuum - it actually plays quite a small role in a much larger application.  Your overall use-case involves low message volume and low latency requirements.

Can you elaborate any more on what specifically the JMS related code is doing?  For example, are you using transactions?  What connection factory are you using?  Are your clients local or remote or both?  How many clients do you have and what kind are they (e.g. producer or consumer)?  The more information you can provide the more likely it is we will be able to help.  

Personally I think comparing Artemis to JBoss Messaging is not terribly useful.  We're talking about implementation level details here as they are related to GC.  We're not going to move back to JBoss Messaging's architecture and we're not going to drop Netty so let's focus on things we might actually be able to change.  It may be that your client code can be optimized or written differently to take advantage of how Artemis is implemented or it may be that you need to tune the GC differently.  We can't help much without more data about your use-case, application, environment, etc.  Performance related issues like this are typically the most difficult to work through, especially on a mailing list.  If you could provide an actual test demonstrating a problem that would be ideal.


Justin

----- Original Message -----
From: "jim_b_o" <ja...@inaseq.com>
To: dev@activemq.apache.org
Sent: Thursday, February 4, 2016 12:17:35 PM
Subject: Re: Artemis use of PhantomReference impacting GC performance

I think the memory comparison is fair in this case.  The main application
load is events received via a ResourceAdapter passed to POJOs which process
and return responses via the same ResourceAdapter.  There is no Web
component and EJBs are only for admin functions and not being invoked during
these tests.  The JMS is an outbound feed of a small subset of the processed
events.  So the only container code under test is the JMS and a small part
of the ResourceAdapter glue although most of that is done using Spring and
is the same in both containers.  The POJOs and ResourceAdapter are identical
in both systems.

This does impact application performance.  I don't doubt that Artemis can
handle more messages and do so faster than the old messaging but it is not
operating in a vacuum.  The JMS performance is not a critical aspect of this
system.  The critical aspect is how quickly and reliably the POJOs can
receive/process/respond via the ResourceAdapter.  These longer
stop-the-world GCs that appear to be due to the JMS cause hick-ups in that
processing.  The JMS is doubling the GC pause time.

I'll run another test to confirm if the excess PhantomReferences are from
Netty or XNIO.  I'm currently connecting the JMS and EJB client
simultaneously so I'll separate them.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706976.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

Make sure you reuse connections on your test. 

We will need something concrete like a runnable test If you see something wrong 


> On Feb 4, 2016, at 1:17 PM, jim_b_o <ja...@inaseq.com> wrote:
> M
> I think  the memory comparison is fair in this case.  The main application
> load is events received via a ResourceAdapter passed to POJOs which process
> and return responses via the same ResourceAdapter.  There is no Web
> component and EJBs are only for admin functions and not being invoked during
> these tests.  The JMS is an outbound feed of a small subset of the processed
> events.  So the only container code under test is the JMS and a small part
> of the ResourceAdapter glue although most of that is done using Spring and
> is the same in both containers.  The POJOs and ResourceAdapter are identical
> in both systems.
> 
> This does impact application performance.  I don't doubt that Artemis can
> handle more messages and do so faster than the old messaging but it is not
> operating in a vacuum.  The JMS performance is not a critical aspect of this
> system.  The critical aspect is how quickly and reliably the POJOs can
> receive/process/respond via the ResourceAdapter.  These longer
> stop-the-world GCs that appear to be due to the JMS cause hick-ups in that
> processing.  The JMS is doubling the GC pause time.
> 
> I'll run another test to confirm if the excess PhantomReferences are from
> Netty or XNIO.  I'm currently connecting the JMS and EJB client
> simultaneously so I'll separate them.
> 
> 
> 
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706976.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

I think the memory comparison is fair in this case. The main application
load is events received via a ResourceAdapter passed to POJOs which process
and return responses via the same ResourceAdapter. There is no Web
component and EJBs are only for admin functions and not being invoked during
these tests. The JMS is an outbound feed of a small subset of the processed
events. So the only container code under test is the JMS and a small part
of the ResourceAdapter glue although most of that is done using Spring and
is the same in both containers. The POJOs and ResourceAdapter are identical
in both systems.

This does impact application performance. I don't doubt that Artemis can
handle more messages and do so faster than the old messaging but it is not
operating in a vacuum. The JMS performance is not a critical aspect of this
system. The critical aspect is how quickly and reliably the POJOs can
receive/process/respond via the ResourceAdapter. These longer
stop-the-world GCs that appear to be due to the JMS cause hick-ups in that
processing. The JMS is doubling the GC pause time.

I'll run another test to confirm if the excess PhantomReferences are from
Netty or XNIO. I'm currently connecting the JMS and EJB client
simultaneously so I'll separate them.

--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706976.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Justin Bertram <jb...@apache.com>.

I'm not 100% certain that it's valid to compare the memory utilization of two software platforms that, despite their functional overlap, are 100% different from each other in their implementation.

Is this GC discrepancy impacting application performance?  Artemis is certainly far beyond JBoss Messaging in terms of performance (as tests have demonstrated) so at this point it's hard for me to believe that this "backwards step" with regard to GC will result in an actual performance penalty for the application itself.

>From what I can tell based on the data you're shared the additional PhantomReferences are coming from Netty's use of java.nio.DirectByteBuffer.  Netty is an NIO client/server framework for networked applications and it provides Artemis with a high-performance TCP transport.  XNIO is not part of Artemis we can exclude those references from our conversation.


Justin

----- Original Message -----
From: "jim_b_o" <ja...@inaseq.com>
To: dev@activemq.apache.org
Sent: Thursday, February 4, 2016 10:29:25 AM
Subject: Artemis use of PhantomReference impacting GC performance

I have observed a significant degradation in GC performance after upgrading
from JBoss 4.2.3.GA to WildFly 10.0.0.Final.  I'm 99% sure the difference is
due to Artemis ActiveMQ making heavy use of PhantomReference objects
(directly or indirectly).  I'm raising this discussion to see if anyone else
can observe the same behaviour and determine whether to raise an issue
against Artemis or one of its dependencies.

What's changed?


I have JBoss 4.2.3.GA (with jboss-messaging-1.4.5.GA) and WildFly
10.0.0.Final running on identical hardware with identical JVMs (8u71) and GC
settings (using G1GC).  The application is only minimally changed as
required by the API differences between the old and new servers.  The
application load is identical and results in near identical memory
allocation rate and young GC frequency.

*The difference is that on JBoss each young GC takes around 10-12ms whereas
on WildFly they take 16-20ms.*  The additional time is taken in the Object
Copy and Ref Proc phases.  This indicates that there are more Survivor
objects on each GC and more References being processed.  I'm pretty sure the
additional Survivor objects are also the Reference objects.  Running a Java
Mission Control (JMC) Flight Recording on these two systems under identical
production load, WildFly is showing around 11,000 PhantomReferences
processed each GC versus a handful on JBoss.  The number of
PhantomReferences seems to approximately correspond to the number of JMS
messages transmitted between GCs.

Some evidence


In a separate test environment I have WildFly running under a standard test
load.  It is emitting JMS messages at a consistent rate (lower than
production) onto a non-durable Topic.  The following image shows a 2 minute
JMC recording when there is no subscriber on the Topic.  We can see that
there are very few PhantomReferences and the GCs take around 4ms.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithoutConsumer.png> 

However once I start up an external JMS client subscribed to the Topic the
PhantomReferences appear and the GCs slow down as shown in the next image. 
The messaging is non-XA with auto-ack.

<http://activemq.2283324.n4.nabble.com/file/n4706961/WithConsumer.png> 

Where do the PhantomReferences come from?


I used the debugger to capture where the PhantomReferences are being
created.  This seems to indicate they are generated by Artemis and/or the
associated XNIO remoting...

Thread [Thread-1
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
(Suspended (breakpoint at line 80 in PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	NettyServerConnection.createTransportBuffer(int) line: 38	

RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
line: 162	
	SessionReceiveMessage.encode(RemotingConnection) line: 59	
	ChannelImpl.send(Packet, boolean, boolean) line: 225	
	ChannelImpl.sendBatched(Packet) line: 205	
	CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int) line:
84	
	ServerConsumerImpl.deliverStandardMessage(MessageReference, ServerMessage)
line: 883	
	ServerConsumerImpl.proceedDeliver(MessageReference) line: 366	
	QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358	
	QueueImpl.deliver() line: 1873	
	QueueImpl.access$1400(QueueImpl) line: 97	
	QueueImpl$DeliverRunner.run() line: 2581	
	OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100	
	ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142	
	ThreadPoolExecutor$Worker.run() line: 617	
	Thread.run() line: 745	
Thread [default I/O-5] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559	
Thread [default I/O-4] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559	
Thread [default I/O-2] (Suspended (breakpoint at line 80 in
PhantomReference))	
	Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80	
	Cleaner.<init>(Object, Runnable) line: 115	
	Cleaner.create(Object, Runnable) line: 133	
	DirectByteBuffer.<init>(int) line: 139	
	ByteBuffer.allocateDirect(int) line: 311	
	UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108	
	UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69	
	UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
line: 155	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
146	
	UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107	
	AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
104	

AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
line: 426	
	AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371	
	ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
92	
	ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66	
	NioSocketConduit.handleReady(int) line: 88	
	WorkerThread.run() line: 559
	
What's next?


So it's seems pretty certain that this is where the change is coming from. 
The total GC load is still low but we're aiming for consistent low latency
and this is a backwards step.

I'd be interested in any comments or questions.  I can provide more info if
required.

Thanks.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

Perhaps you could achieve a lot with VM parameters according to your load?


> On Feb 4, 2016, at 1:35 PM, jim_b_o <ja...@inaseq.com> wrote:
> 
> The GC pause times are a very important metric for our app and I'm sure for
> many others.  Our system is latency critical and delays cost money.  GC
> pauses are totally concrete.  Events are being processed all the time (2
> main types: one type handled in 60-70 microseconds, other type in 2-3
> milliseconds) and throwing a GC pause into the middle of that processing
> used to add another 10-12 milliseconds but now adds 16-20 milliseconds.
> 
> Fortunately we are not burning memory too quickly so the GCs aren't
> frequent.  But load is sufficient that most times one occurs it will be
> impacting one or more events.  The impact on average latency is negligible
> but the financial cost of additional latency on select events is not.
> 
> 
> 
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706978.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by jim_b_o <ja...@inaseq.com>.

The GC pause times are a very important metric for our app and I'm sure for
many others.  Our system is latency critical and delays cost money.  GC
pauses are totally concrete.  Events are being processed all the time (2
main types: one type handled in 60-70 microseconds, other type in 2-3
milliseconds) and throwing a GC pause into the middle of that processing
used to add another 10-12 milliseconds but now adds 16-20 milliseconds.

Fortunately we are not burning memory too quickly so the GCs aren't
frequent.  But load is sufficient that most times one occurs it will be
impacting one or more events.  The impact on average latency is negligible
but the financial cost of additional latency on select events is not.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961p4706978.html
Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Artemis use of PhantomReference impacting GC performance

Posted by Clebert Suconic <cl...@gmail.com>.

This discussion is probably a better fit on a Wildfly user's forum.


also I don't think measuring Milliseconds times taken on GC are a good
measure. You need to provide something more concrete such as..
.latency has increased... throughput degrated.. etc.. etc...


Netty Probably uses some PhantonReferences to control reused pools. We
have recently improved that reuse and we are always going beyond the
basics.



also: XNIO is a JBoss Project, so I don't know much about XNIO. (even
if I did this forums wouldn't be a fit place for this discussion).

On Thu, Feb 4, 2016 at 11:29 AM, jim_b_o <ja...@inaseq.com> wrote:
> I have observed a significant degradation in GC performance after upgrading
> from JBoss 4.2.3.GA to WildFly 10.0.0.Final.  I'm 99% sure the difference is
> due to Artemis ActiveMQ making heavy use of PhantomReference objects
> (directly or indirectly).  I'm raising this discussion to see if anyone else
> can observe the same behaviour and determine whether to raise an issue
> against Artemis or one of its dependencies.
>
> What's changed?
>
>
> I have JBoss 4.2.3.GA (with jboss-messaging-1.4.5.GA) and WildFly
> 10.0.0.Final running on identical hardware with identical JVMs (8u71) and GC
> settings (using G1GC).  The application is only minimally changed as
> required by the API differences between the old and new servers.  The
> application load is identical and results in near identical memory
> allocation rate and young GC frequency.
>
> *The difference is that on JBoss each young GC takes around 10-12ms whereas
> on WildFly they take 16-20ms.*  The additional time is taken in the Object
> Copy and Ref Proc phases.  This indicates that there are more Survivor
> objects on each GC and more References being processed.  I'm pretty sure the
> additional Survivor objects are also the Reference objects.  Running a Java
> Mission Control (JMC) Flight Recording on these two systems under identical
> production load, WildFly is showing around 11,000 PhantomReferences
> processed each GC versus a handful on JBoss.  The number of
> PhantomReferences seems to approximately correspond to the number of JMS
> messages transmitted between GCs.
>
> Some evidence
>
>
> In a separate test environment I have WildFly running under a standard test
> load.  It is emitting JMS messages at a consistent rate (lower than
> production) onto a non-durable Topic.  The following image shows a 2 minute
> JMC recording when there is no subscriber on the Topic.  We can see that
> there are very few PhantomReferences and the GCs take around 4ms.
>
> <http://activemq.2283324.n4.nabble.com/file/n4706961/WithoutConsumer.png>
>
> However once I start up an external JMS client subscribed to the Topic the
> PhantomReferences appear and the GCs slow down as shown in the next image.
> The messaging is non-XA with auto-ack.
>
> <http://activemq.2283324.n4.nabble.com/file/n4706961/WithConsumer.png>
>
> Where do the PhantomReferences come from?
>
>
> I used the debugger to capture where the PhantomReferences are being
> created.  This seems to indicate they are generated by Artemis and/or the
> associated XNIO remoting...
>
> Thread [Thread-1
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@788d198e-832298282)]
> (Suspended (breakpoint at line 80 in PhantomReference))
>         Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80
>         Cleaner.<init>(Object, Runnable) line: 115
>         Cleaner.create(Object, Runnable) line: 133
>         DirectByteBuffer.<init>(int) line: 139
>         ByteBuffer.allocateDirect(int) line: 311
>         UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108
>         UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69
>         UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
> line: 155
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
> 146
>         NettyServerConnection.createTransportBuffer(int) line: 38
>
> RemotingConnectionImpl(AbstractRemotingConnection).createTransportBuffer(int)
> line: 162
>         SessionReceiveMessage.encode(RemotingConnection) line: 59
>         ChannelImpl.send(Packet, boolean, boolean) line: 225
>         ChannelImpl.sendBatched(Packet) line: 205
>         CoreSessionCallback.sendMessage(ServerMessage, ServerConsumer, int) line:
> 84
>         ServerConsumerImpl.deliverStandardMessage(MessageReference, ServerMessage)
> line: 883
>         ServerConsumerImpl.proceedDeliver(MessageReference) line: 366
>         QueueImpl.proceedDeliver(Consumer, MessageReference) line: 2358
>         QueueImpl.deliver() line: 1873
>         QueueImpl.access$1400(QueueImpl) line: 97
>         QueueImpl$DeliverRunner.run() line: 2581
>         OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run() line: 100
>         ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142
>         ThreadPoolExecutor$Worker.run() line: 617
>         Thread.run() line: 745
> Thread [default I/O-5] (Suspended (breakpoint at line 80 in
> PhantomReference))
>         Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80
>         Cleaner.<init>(Object, Runnable) line: 115
>         Cleaner.create(Object, Runnable) line: 133
>         DirectByteBuffer.<init>(int) line: 139
>         ByteBuffer.allocateDirect(int) line: 311
>         UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108
>         UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69
>         UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
> line: 155
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
> 146
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107
>         AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
> 104
>
> AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
> line: 426
>         AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371
>         ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
> 92
>         ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66
>         NioSocketConduit.handleReady(int) line: 88
>         WorkerThread.run() line: 559
> Thread [default I/O-4] (Suspended (breakpoint at line 80 in
> PhantomReference))
>         Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80
>         Cleaner.<init>(Object, Runnable) line: 115
>         Cleaner.create(Object, Runnable) line: 133
>         DirectByteBuffer.<init>(int) line: 139
>         ByteBuffer.allocateDirect(int) line: 311
>         UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108
>         UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69
>         UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
> line: 155
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
> 146
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107
>         AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
> 104
>
> AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
> line: 426
>         AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371
>         ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
> 92
>         ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66
>         NioSocketConduit.handleReady(int) line: 88
>         WorkerThread.run() line: 559
> Thread [default I/O-2] (Suspended (breakpoint at line 80 in
> PhantomReference))
>         Cleaner(PhantomReference<T>).<init>(T, ReferenceQueue<? super T>) line: 80
>         Cleaner.<init>(Object, Runnable) line: 115
>         Cleaner.create(Object, Runnable) line: 133
>         DirectByteBuffer.<init>(int) line: 139
>         ByteBuffer.allocateDirect(int) line: 311
>         UnpooledUnsafeDirectByteBuf.allocateDirect(int) line: 108
>         UnpooledUnsafeDirectByteBuf.<init>(ByteBufAllocator, int, int) line: 69
>         UnpooledByteBufAllocator.newDirectBuffer(int, int) line: 50
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int, int)
> line: 155
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).directBuffer(int) line:
> 146
>         UnpooledByteBufAllocator(AbstractByteBufAllocator).ioBuffer(int) line: 107
>         AdaptiveRecvByteBufAllocator$HandleImpl.allocate(ByteBufAllocator) line:
> 104
>
> AbstractXnioSocketChannel$ReadListener.handleEvent(ConduitStreamSourceChannel)
> line: 426
>         AbstractXnioSocketChannel$ReadListener.handleEvent(Channel) line: 371
>         ChannelListeners.invokeChannelListener(T, ChannelListener<? super T>) line:
> 92
>         ReadReadyHandler$ChannelListenerHandler<C>.readReady() line: 66
>         NioSocketConduit.handleReady(int) line: 88
>         WorkerThread.run() line: 559
>
> What's next?
>
>
> So it's seems pretty certain that this is where the change is coming from.
> The total GC load is still low but we're aiming for consistent low latency
> and this is a backwards step.
>
> I'd be interested in any comments or questions.  I can provide more info if
> required.
>
> Thanks.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-use-of-PhantomReference-impacting-GC-performance-tp4706961.html
> Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.



-- 
Clebert Suconic