You are viewing a plain text version of this content. The canonical link for it is here.

Posted to proton@qpid.apache.org by Alan Conway <ac...@redhat.com> on 2015/02/16 22:58:14 UTC

[proton] checkin broke dispatch test: 6136f11 make container a subclass of reactor and remove redundant code which is replaced by reactor

This checkin:

6136f11 make container a subclass of reactor and remove redundant code
which is replaced by reactor

severely broke the dispatch system tests. I haven't looked into it yet
but my best guess is they broker the SyncRequestResponse class that is
used by the dispatch tools and hence by most of those tests.

I will investigate, if anybody else sees anything please let me know.
Apart from fixing the issue we need to improve the tests for that class
in the proton build (that would be my fault.)

Cheers,
Alan.

Re: [proton] checkin broke dispatch test: 6136f11 make container a subclass of reactor and remove redundant code which is replaced by reactor

Posted by Alan Conway <ac...@redhat.com>.

On Tue, 2015-02-17 at 10:38 +0000, Gordon Sim wrote:
> On 02/16/2015 09:58 PM, Alan Conway wrote:
> > This checkin:
> >
> > 6136f11 make container a subclass of reactor and remove redundant code
> > which is replaced by reactor
> >
> > severely broke the dispatch system tests. I haven't looked into it yet
> > but my best guess is they broker the SyncRequestResponse class that is
> > used by the dispatch tools and hence by most of those tests.
> >
> > I will investigate, if anybody else sees anything please let me know.
> > Apart from fixing the issue we need to improve the tests for that class
> > in the proton build (that would be my fault.)
> 
> There is one test for SyncRequestResponse, which was passing when the 
> commit was made. There are also a couple of examples using 
> BlockingConnection on which it depends, which were also working fine.
> 
> However the change above is a major one, the entire internal plumbing of 
> the container being replaced by the c reactor, so it is certainly quite 
> possible that there are some bugs.

The memory errors I reported have been going on since before that commit
so it may just be a coincidence that the problems showed  up at that
point.

Re: [proton] checkin broke dispatch test: 6136f11 make container a subclass of reactor and remove redundant code which is replaced by reactor

Posted by Gordon Sim <gs...@redhat.com>.

On 02/16/2015 09:58 PM, Alan Conway wrote:
> This checkin:
>
> 6136f11 make container a subclass of reactor and remove redundant code
> which is replaced by reactor
>
> severely broke the dispatch system tests. I haven't looked into it yet
> but my best guess is they broker the SyncRequestResponse class that is
> used by the dispatch tools and hence by most of those tests.
>
> I will investigate, if anybody else sees anything please let me know.
> Apart from fixing the issue we need to improve the tests for that class
> in the proton build (that would be my fault.)

There is one test for SyncRequestResponse, which was passing when the 
commit was made. There are also a couple of examples using 
BlockingConnection on which it depends, which were also working fine.

However the change above is a major one, the entire internal plumbing of 
the container being replaced by the c reactor, so it is certainly quite 
possible that there are some bugs.

Re: [proton] valgrind errors on trunk, some memory debugging tips.

Posted by Andrew Stitcher <as...@redhat.com>.

On Mon, 2015-02-16 at 17:55 -0500, Alan Conway wrote:
> ... [Some really good advice]

> 2. Linux fans: set env. var. MALLOC_PRETURB_ (note trailing underscore)

Just to save any frustration that would be:

MALLOC_PERTURB_

Note the real English word in there!

Andrew

[proton] valgrind errors on trunk, some memory debugging tips.

Posted by Alan Conway <ac...@redhat.com>.

I'm seeing a bunch of valgrind errors in the python tests on trunk
(attached) A couple quick tips for memory errors:

1. You *can* use valgrind on python, you just need to disable "possibly
lost" errors. The design of python generates tons of these, they are not
really errors. "definitely lost" are the ones to care about. Here's
my .valgrindrc:

--error-exitcode=42
--leak-check=full
--show-leak-kinds=definite
--errors-for-leak-kinds=definite
--num-callers=100


2. Linux fans: set env. var. MALLOC_PRETURB_ (note trailing underscore)
to something random. Some people actually set it to $RANDOM, I set it to
66 which is 42 in hex, so when I run gdb and see 42424242 I know
something was using freed memory.

Explanation: if MALLOC_PRETURB_ is set, linux malloc will set free
memory to this byte value. Without this, freed memory may retain valid
values till it gets recycled and (ab)use of freed memory may go
unnoticed.  With this, using freed memory will usually crash your
program straight away and in a debugger freed memory is very obvious. 

E.g. without MALLOC_PRETURB_ the proton python tests pass for me. With
it set they crash straight away and I see a bunch of 424242 in the core
dump. With or without MALLOC_PRETURB_ valgrind spots the bug and tells
me where the memory was freed.