You are viewing a plain text version of this content. The canonical link for it is here.
Posted to proton@qpid.apache.org by "Darryl L. Pierce" <dp...@redhat.com> on 2015/03/27 13:43:53 UTC

Strange memory issues with handlers

Seeing a strange issue while debugging a segfault issue with reactors
and Ruby. I'm seeing the char* value for a record showing up as the
dispatch address for either a handler->dispatch or else one of its
children.

= Records

In Ruby, to avoid objects being reaped by GC, they aren't given to the C
library. Instead, a char* is malloced and the string snprintf'd with the
hex address of the pn_record_t, the pn_handle_t and the value being
stored. That string is given to the pn_record_set to store, while a ruby
version of the string value is created and used as the key to stuff the
actual Ruby object into a hidden global hash.

Later, when pn_record_get is called, it retrieves that previous key
value and uses it to retrieve the object from the global hash.

Lastly, if registered, a cleanup function is called that allows for
deleting the malloc'd key, which also removes the entry from the hash
and allows the Ruby string and object to be GC'd.

= Handlers And The Memory Issue

So, like with the above records API, Ruby handlers can't be stored in
the C libraries directly. Instead, the code uses the struct:

typedef struct {
    char *handler_key;
} pni_rbhandler_t;

The value for handler_key is, like with records, the hex form of the
Ruby object and the pni_rbhandler_t object created.  And the handler is
stored in the global hash using that key value so it can be retrieved
later. And the pn_handler_finalize method removes the key and value from
the hash later.

= The Problem

In diagnosing the segfault I found that, when it happens, about 80% of
the time the segfault is caused by handler->dispatch having a bad value.

Specifically I saw values like this for the memory address held:

3536373137302d393231

Looking it, it looked like an ASCII string. So converting it to ASCII I
saw that it was, in this case, the string "567170-921". And when
something like that happened, there was a record stored previous that
had a key containing the string "129-071765"; i.e., the address pointed
to by handler->dispatch was overwritten by the string that had been
stored by a call to pn_record_set in a separate patch of code.

But the problem is that nothing outside of the library has direct access
to assign these values to fields, and I haven't quite tracked down where
the leak is in the code.

-- 
Darryl L. Pierce, Sr. Software Engineer @ Red Hat, Inc.
Delivering value year after year.
Red Hat ranks #1 in value among software vendors.
http://www.redhat.com/promo/vendor/