You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2018/08/06 16:53:00 UTC
[jira] [Comment Edited] (QPID-8134) qpid::client::Message::send multiple memory leaks

    [ https://issues.apache.org/jira/browse/QPID-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570478#comment-16570478 ] 

Alan Conway edited comment on QPID-8134 at 8/6/18 4:52 PM:
-----------------------------------------------------------

I don't doubt your report, but I'm not able to reproduce it, there's something different about what I'm doing.

[aconway@grommit bz (master *%)]$ rpm -qa '*qpid*'
 qpid-cpp-server-1.38.0-1.fc28.x86_64
 qpid-cpp-client-1.38.0-1.fc28.x86_64
 qpid-proton-c-0.21.0-2.fc28.x86_64

 

My script looks like this:

for i in 1000 10000 100000; do
     echo "== testing: $i";
     sh -x gospout.sh -a doSubject -c $i
 done > test.out 2>&1
 grep '==.*(lost|reachable|testing):' test.out

I get output like this:

+ echo '== testing: 1000'
 == testing: 1000
 ==9445==    definitely lost: 0 bytes in 0 blocks
 ==9445==    indirectly lost: 0 bytes in 0 blocks
 ==9445==      possibly lost: 0 bytes in 0 blocks
 ==9445==    still reachable: 50,908 bytes in 319 blocks
 + echo '== testing: 10000'
 == testing: 10000
 ==9450==    definitely lost: 0 bytes in 0 blocks
 ==9450==    indirectly lost: 0 bytes in 0 blocks
 ==9450==      possibly lost: 0 bytes in 0 blocks
 ==9450==    still reachable: 41,230 bytes in 265 blocks
 + echo '== testing: 100000'
 == testing: 100000
 ==9457==    definitely lost: 0 bytes in 0 blocks
 ==9457==    indirectly lost: 0 bytes in 0 blocks
 ==9457==      possibly lost: 0 bytes in 0 blocks
 ==9457==    still reachable: 31,596 bytes in 214 blocks

 

Its a bit surprising that the final memory is lower on longer runs, but valgrind --tool=massif shows that the memory holds steady during the run and then flutters a bit during shutdown so there may be some randomish effects there based on clean-up ordering etc.

I did have to fix one minor bug in gospout.sh - getopts automatically shifts the arguments, so shifting again in the script means only the first option gets read. I'll attach a tar file with the full output, scripts & source I'm using: [^qpid-8134.tgz]


was (Author: aconway):
I don't doubt your report, but I'm not able to reproduce it, there's something different about what I'm doing.

[aconway@grommit bz (master *%)]$ rpm -qa '*qpid*'
qpid-cpp-server-1.38.0-1.fc28.x86_64
qpid-cpp-client-1.38.0-1.fc28.x86_64
qpid-proton-c-0.21.0-2.fc28.x86_64

 

My script looks like this:

for i in 1000 10000 100000; do
    echo "== testing: $i";
    sh -x gospout.sh -a doSubject -c $i
done > test.out 2>&1
grep '==.*\(lost\|reachable\|testing\):' test.out

I get output like this:

+ echo '== testing: 1000'
== testing: 1000
==9445==    definitely lost: 0 bytes in 0 blocks
==9445==    indirectly lost: 0 bytes in 0 blocks
==9445==      possibly lost: 0 bytes in 0 blocks
==9445==    still reachable: 50,908 bytes in 319 blocks
+ echo '== testing: 10000'
== testing: 10000
==9450==    definitely lost: 0 bytes in 0 blocks
==9450==    indirectly lost: 0 bytes in 0 blocks
==9450==      possibly lost: 0 bytes in 0 blocks
==9450==    still reachable: 41,230 bytes in 265 blocks
+ echo '== testing: 100000'
== testing: 100000
==9457==    definitely lost: 0 bytes in 0 blocks
==9457==    indirectly lost: 0 bytes in 0 blocks
==9457==      possibly lost: 0 bytes in 0 blocks
==9457==    still reachable: 31,596 bytes in 214 blocks

 

Its a bit surprising that the final memory is lower on longer runs, but valgrind --tool=massif shows that the memory holds steady during the run and then flutters a bit during shutdown so there may be some randomish effects there based on clean-up ordering etc.

I did have to fix one minor bug in gospout.sh - getopts automatically shifts the arguments, so shifting again in the script means only the first option gets read. I'll attach a tar file with the full output, scripts & source I'm using.

> qpid::client::Message::send multiple memory leaks
> -------------------------------------------------
>
>                 Key: QPID-8134
>                 URL: https://issues.apache.org/jira/browse/QPID-8134
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: qpid-cpp-1.37.0, qpid-cpp-1.38.0
>         Environment: *CentOS* Linux release 7.4.1708 (Core)
> Linux localhost.novalocal 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> *qpid*-qmf-1.37.0-1.el7.x86_64
> *qpid*-dispatch-debuginfo-1.0.0-1.el7.x86_64
> python-*qpid*-1.37.0-1.el7.noarch
> *qpid*-proton-c-0.18.1-1.el7.x86_64
> python-*qpid*-qmf-1.37.0-1.el7.x86_64
> *qpid*-proton-debuginfo-0.18.1-1.el7.x86_64
> *qpid*-cpp-debuginfo-1.37.0-1.el7.x86_64
> *qpid*-cpp-client-devel-1.37.0-1.el7.x86_64
> *qpid*-cpp-server-1.37.0-1.el7.x86_64
> *qpid*-cpp-client-1.37.0-1.el7.x86_64
>  
>            Reporter: dan clark
>            Assignee: Alan Conway
>            Priority: Blocker
>              Labels: leak, maven
>             Fix For: qpid-cpp-1.39.0
>
>         Attachments: drain.cpp, godrain.sh, gospout.sh, qpid-8134.tgz, qpid-stat.out, spout.cpp, spout.log
>
>   Original Estimate: 40h
>  Remaining Estimate: 40h
>
> There may be multiple leaks of the outgoing message structure and associated fields when using the qpid::client::amqp0_10::SenderImpl::send function to publish messages under certain setups. I will concede that there may be options that are beyond my ken to ameliorate the leak of messages structures, especially since there is an indication that under prolonged runs (a demonized version of an application like spout) that the statistics for quidd indicate increased acquires with zero releases.
> The basic notion is illustrated with the test application spout (and drain).  Consider a long running daemon reducing the overhead of open/send/close by keeping the message connection open for long periods of time.  Then the logic would be: start application/open connection.  In a loop send data (and never reach a close).  Thus the drain application illustrates the behavior and demonstrates the leak using valgrind by sending the data followed by an exit(0).  
> Note also the lack of 'releases' associated with the 'acquires' in the stats output.
> Capturing the leaks using the test applications spout/drain required adding an 'exit()' prior to the close, as during normal operations of a daemon, the connection remains open for a sustained period of time, thus the leak of structures within the c++ client library are found as structures still tracked by the library and cleaned up on 'connection.close()', but they should be cleaned up as a result of the completion of the send/receive ack or the termination of the life of the message based on the TTL of the message, which ever comes first.  I have witnessed growth of the leaked structures into the millions of messages lasting more than 24hours with short (300sec) TTL of the messages based on scenarios attached using spout/drain as test vehicle.
> The attached spout.log uses a short 10message test and the spout.log contains 5 sets of different structures leaked (found with the 'bytes in 10 blocks are still reachable' lines, that are in line with much more sustained leaks when running the application for multiple days with millions of messages.
> The leaks seem to be associated with structures allocation 'stdstrings' to save the "subject" and the "payload" for string based messages using send for amq.topic output.
> Suggested work arounds are welcome based on application level changes to spout/drain (if they are missing key components) or changes to the address/setup of the queues for amq.topic messages (see the 'gospout.sh and godrain.sh' test drivers providing the specific address structures being used.
> For example, the following is one of the 5 different categories of leaked data from 'spout.log' on a valgrind analysis of the output post the send and session.sync but prior connection.close():
>  
> ==3388== 3,680 bytes in 10 blocks are still reachable in loss record 233 of 234
> ==3388==    at 0x4C2A203: operator new(unsigned long) (vg_replace_malloc.c:334)
> ==3388==    by 0x4EB046C: qpid::client::Message::Message(std::string const&, std::string const&) (Message.cpp:31)
> ==3388==    by 0x51742C1: qpid::client::amqp0_10::OutgoingMessage::OutgoingMessage() (OutgoingMessage.cpp:167)
> ==3388==    by 0x5186200: qpid::client::amqp0_10::SenderImpl::sendImpl(qpid::messaging::Message const&) (SenderImpl.cpp:140)
> ==3388==    by 0x5186485: operator() (SenderImpl.h:114)
> ==3388==    by 0x5186485: execute<qpid::client::amqp0_10::SenderImpl::Send> (SessionImpl.h:102)
> ==3388==    by 0x5186485: qpid::client::amqp0_10::SenderImpl::send(qpid::messaging::Message const&, bool) (SenderImpl.cpp:49)
> ==3388==    by 0x40438D: main (spout.cpp:185)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org