You are viewing a plain text version of this content. The canonical link for it is here.
Posted to proton@qpid.apache.org by Gordon Sim <gs...@redhat.com> on 2015/07/15 18:30:57 UTC

proton 0.10 blocker

The latest proton code is causing crashes in qpid-cpp tests that use it. 
I've tracked the problem down to the fix for PROTON-905[1] and proposed 
an enhancement to that fix, https://reviews.apache.org/r/36509/, which 
avoids the crash.

Could someone who understands the logic controlling the lifecycle of 
pn_session_t and pn_link_t objects in detail review and approve this please?


[1] https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=653f4e5

Re: proton 0.10 blocker

Posted by Ken Giusti <kg...@redhat.com>.
PROTON-905 reverted.


----- Original Message -----
> From: "Ken Giusti" <kg...@redhat.com>
> To: proton@qpid.apache.org
> Cc: dev@qpid.apache.org
> Sent: Tuesday, July 28, 2015 10:49:20 AM
> Subject: Re: proton 0.10 blocker
> 
> Awww crap.
> 
> Let's revert the whole thing then.  I always felt that 'fix' was more voodoo
> than science.
> 
> IMHO I don't think the original bug should be a blocker for 0.10.  IIRC it
> isn't a regression, and there's a 'work around' by using proton events.
> 
> 
> 
> ----- Original Message -----
> > From: "Gordon Sim" <gs...@redhat.com>
> > To: dev@qpid.apache.org
> > Cc: proton@qpid.apache.org
> > Sent: Tuesday, July 28, 2015 10:09:07 AM
> > Subject: Re: proton 0.10 blocker
> > 
> > On 07/20/2015 08:53 PM, Rafael Schloming wrote:
> > > I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
> > > into the refcounting issue personally right now, but I'd at least leave
> > > the
> > > bug open until we have made it through a bit more testing. I have an
> > > uneasy
> > > feeling it (or something closely related) may pop up again if we push
> > > harder on testing.
> > 
> > You were right I'm afraid! I've seen further failures of a similar
> > nature even with that fix now committed. They seem less frequent
> > (previously I was seeing a core dump every run, now its more like 2 out
> > of 10 or so).
> > 
> > My inclination is to revert the original fix for PROTON-905 for now,
> > until more detailed testing and investigation can be carried out. The
> > fix prevents a build up of memory on long running connections where
> > multiple sessions (and links) are opened then closed. As reported this
> > only happens with older versions of the broker. In any case it does also
> > have a workaround (albeit a very inconvenient one) of restarting the
> > connection every so often, at which point the memory is freed.
> > 
> > By contrast I don't understand enough about these crashes to suggest any
> > workaround and the effects are more serious even for the latest broker.
> > 
> 
> --
> -K
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
> 
> 

-- 
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Ken Giusti <kg...@redhat.com>.
PROTON-905 reverted.


----- Original Message -----
> From: "Ken Giusti" <kg...@redhat.com>
> To: proton@qpid.apache.org
> Cc: dev@qpid.apache.org
> Sent: Tuesday, July 28, 2015 10:49:20 AM
> Subject: Re: proton 0.10 blocker
> 
> Awww crap.
> 
> Let's revert the whole thing then.  I always felt that 'fix' was more voodoo
> than science.
> 
> IMHO I don't think the original bug should be a blocker for 0.10.  IIRC it
> isn't a regression, and there's a 'work around' by using proton events.
> 
> 
> 
> ----- Original Message -----
> > From: "Gordon Sim" <gs...@redhat.com>
> > To: dev@qpid.apache.org
> > Cc: proton@qpid.apache.org
> > Sent: Tuesday, July 28, 2015 10:09:07 AM
> > Subject: Re: proton 0.10 blocker
> > 
> > On 07/20/2015 08:53 PM, Rafael Schloming wrote:
> > > I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
> > > into the refcounting issue personally right now, but I'd at least leave
> > > the
> > > bug open until we have made it through a bit more testing. I have an
> > > uneasy
> > > feeling it (or something closely related) may pop up again if we push
> > > harder on testing.
> > 
> > You were right I'm afraid! I've seen further failures of a similar
> > nature even with that fix now committed. They seem less frequent
> > (previously I was seeing a core dump every run, now its more like 2 out
> > of 10 or so).
> > 
> > My inclination is to revert the original fix for PROTON-905 for now,
> > until more detailed testing and investigation can be carried out. The
> > fix prevents a build up of memory on long running connections where
> > multiple sessions (and links) are opened then closed. As reported this
> > only happens with older versions of the broker. In any case it does also
> > have a workaround (albeit a very inconvenient one) of restarting the
> > connection every so often, at which point the memory is freed.
> > 
> > By contrast I don't understand enough about these crashes to suggest any
> > workaround and the effects are more serious even for the latest broker.
> > 
> 
> --
> -K
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
> 
> 

-- 
-K

Re: proton 0.10 blocker

Posted by Ken Giusti <kg...@redhat.com>.
Awww crap.

Let's revert the whole thing then.  I always felt that 'fix' was more voodoo than science.

IMHO I don't think the original bug should be a blocker for 0.10.  IIRC it isn't a regression, and there's a 'work around' by using proton events.



----- Original Message -----
> From: "Gordon Sim" <gs...@redhat.com>
> To: dev@qpid.apache.org
> Cc: proton@qpid.apache.org
> Sent: Tuesday, July 28, 2015 10:09:07 AM
> Subject: Re: proton 0.10 blocker
> 
> On 07/20/2015 08:53 PM, Rafael Schloming wrote:
> > I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
> > into the refcounting issue personally right now, but I'd at least leave the
> > bug open until we have made it through a bit more testing. I have an uneasy
> > feeling it (or something closely related) may pop up again if we push
> > harder on testing.
> 
> You were right I'm afraid! I've seen further failures of a similar
> nature even with that fix now committed. They seem less frequent
> (previously I was seeing a core dump every run, now its more like 2 out
> of 10 or so).
> 
> My inclination is to revert the original fix for PROTON-905 for now,
> until more detailed testing and investigation can be carried out. The
> fix prevents a build up of memory on long running connections where
> multiple sessions (and links) are opened then closed. As reported this
> only happens with older versions of the broker. In any case it does also
> have a workaround (albeit a very inconvenient one) of restarting the
> connection every so often, at which point the memory is freed.
> 
> By contrast I don't understand enough about these crashes to suggest any
> workaround and the effects are more serious even for the latest broker.
> 

-- 
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Ken Giusti <kg...@redhat.com>.
Awww crap.

Let's revert the whole thing then.  I always felt that 'fix' was more voodoo than science.

IMHO I don't think the original bug should be a blocker for 0.10.  IIRC it isn't a regression, and there's a 'work around' by using proton events.



----- Original Message -----
> From: "Gordon Sim" <gs...@redhat.com>
> To: dev@qpid.apache.org
> Cc: proton@qpid.apache.org
> Sent: Tuesday, July 28, 2015 10:09:07 AM
> Subject: Re: proton 0.10 blocker
> 
> On 07/20/2015 08:53 PM, Rafael Schloming wrote:
> > I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
> > into the refcounting issue personally right now, but I'd at least leave the
> > bug open until we have made it through a bit more testing. I have an uneasy
> > feeling it (or something closely related) may pop up again if we push
> > harder on testing.
> 
> You were right I'm afraid! I've seen further failures of a similar
> nature even with that fix now committed. They seem less frequent
> (previously I was seeing a core dump every run, now its more like 2 out
> of 10 or so).
> 
> My inclination is to revert the original fix for PROTON-905 for now,
> until more detailed testing and investigation can be carried out. The
> fix prevents a build up of memory on long running connections where
> multiple sessions (and links) are opened then closed. As reported this
> only happens with older versions of the broker. In any case it does also
> have a workaround (albeit a very inconvenient one) of restarting the
> connection every so often, at which point the memory is freed.
> 
> By contrast I don't understand enough about these crashes to suggest any
> workaround and the effects are more serious even for the latest broker.
> 

-- 
-K

Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/20/2015 08:53 PM, Rafael Schloming wrote:
> I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
> into the refcounting issue personally right now, but I'd at least leave the
> bug open until we have made it through a bit more testing. I have an uneasy
> feeling it (or something closely related) may pop up again if we push
> harder on testing.

You were right I'm afraid! I've seen further failures of a similar 
nature even with that fix now committed. They seem less frequent 
(previously I was seeing a core dump every run, now its more like 2 out 
of 10 or so).

My inclination is to revert the original fix for PROTON-905 for now, 
until more detailed testing and investigation can be carried out. The 
fix prevents a build up of memory on long running connections where 
multiple sessions (and links) are opened then closed. As reported this 
only happens with older versions of the broker. In any case it does also 
have a workaround (albeit a very inconvenient one) of restarting the 
connection every so often, at which point the memory is freed.

By contrast I don't understand enough about these crashes to suggest any 
workaround and the effects are more serious even for the latest broker.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/20/2015 08:53 PM, Rafael Schloming wrote:
> I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
> into the refcounting issue personally right now, but I'd at least leave the
> bug open until we have made it through a bit more testing. I have an uneasy
> feeling it (or something closely related) may pop up again if we push
> harder on testing.

You were right I'm afraid! I've seen further failures of a similar 
nature even with that fix now committed. They seem less frequent 
(previously I was seeing a core dump every run, now its more like 2 out 
of 10 or so).

My inclination is to revert the original fix for PROTON-905 for now, 
until more detailed testing and investigation can be carried out. The 
fix prevents a build up of memory on long running connections where 
multiple sessions (and links) are opened then closed. As reported this 
only happens with older versions of the broker. In any case it does also 
have a workaround (albeit a very inconvenient one) of restarting the 
connection every so often, at which point the memory is freed.

By contrast I don't understand enough about these crashes to suggest any 
workaround and the effects are more serious even for the latest broker.

Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
into the refcounting issue personally right now, but I'd at least leave the
bug open until we have made it through a bit more testing. I have an uneasy
feeling it (or something closely related) may pop up again if we push
harder on testing.

--Rafael


On Mon, Jul 20, 2015 at 10:43 AM, Ken Giusti <kg...@redhat.com> wrote:

> +1 for using Gordon's fix for now - we can cut a beta and see if it holds
> up.
>
> Since there's some unanswered questions regarding the patch's behavior,
> I'll leave the bug open - drop the blocker status - and assign it over to
> Rafi.  He's better cerebrally equipped to understand the reference counting
> implementation than I certainly am.
>
>
>
> ----- Original Message -----
> > From: "Robbie Gemmell" <ro...@gmail.com>
> > To: dev@qpid.apache.org, proton@qpid.apache.org
> > Sent: Monday, July 20, 2015 12:03:06 PM
> > Subject: Re: proton 0.10 blocker
> >
> > On 17 July 2015 at 23:32, Gordon Sim <gs...@redhat.com> wrote:
> > > On 07/17/2015 10:04 PM, Rafael Schloming wrote:
> > >>
> > >> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
> > >>
> > >>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
> > >>>
> > >>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com>
> wrote:
> > >>>>
> > >>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
> > >>>>>
> > >>>>>
> > >>>>>   Given this I believe the incref/decref pair is indeed running
> into
> > >>>>>>
> > >>>>>> problems
> > >>>>>> when being invoked from inside a finalizer. I'd be curious if an
> > >>>>>> alternative fix would work. I suspect you could replace the
> additional
> > >>>>>> conditions you added to the if predicate with this:
> > >>>>>>
> > >>>>>>      pn_refcount(endpoint) > 0
> > >>>>>>
> > >>>>>>
> > >>>>> If the refcount is *not* 0, what does the incref/decref sequence
> > >>>>> accomplish?
> > >>>>>
> > >>>>
> > >>>>
> > >>>> I believe the answer to this is the same as the answer I just
> posted on
> > >>>> the
> > >>>> other thread, i.e. the incref may trigger the incref hook (see
> > >>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint
> > >>>> state
> > >>>> and adjust the refcount accordingly. The decref may then end up
> > >>>> finalizing
> > >>>> the object.
> > >>>>
> > >>>
> > >>> Right, understood now.
> > >>>
> > >>> Unfortunately replacing the additional conditions with just that
> check on
> > >>> the refcount doesn't prevent the crash though.
> > >>>
> > >>
> > >> Doh, not the result I was hoping for. Does it yield the same stack
> trace
> > >> as
> > >> before?
> > >
> > >
> > > Yes
> > >
> >
> > Given that and all who looked thinking the earlier proposal was safe,
> > is it worth going with that change at least for now? Knocking things
> > off the blocker list and getting an RC (or even just a beta) out would
> > be really really good.
> >
> > Robbie
> >
>
> --
> -K
>

Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
I'm fine going ahead with Gordon's fix. I don't have a lot of time to dig
into the refcounting issue personally right now, but I'd at least leave the
bug open until we have made it through a bit more testing. I have an uneasy
feeling it (or something closely related) may pop up again if we push
harder on testing.

--Rafael


On Mon, Jul 20, 2015 at 10:43 AM, Ken Giusti <kg...@redhat.com> wrote:

> +1 for using Gordon's fix for now - we can cut a beta and see if it holds
> up.
>
> Since there's some unanswered questions regarding the patch's behavior,
> I'll leave the bug open - drop the blocker status - and assign it over to
> Rafi.  He's better cerebrally equipped to understand the reference counting
> implementation than I certainly am.
>
>
>
> ----- Original Message -----
> > From: "Robbie Gemmell" <ro...@gmail.com>
> > To: dev@qpid.apache.org, proton@qpid.apache.org
> > Sent: Monday, July 20, 2015 12:03:06 PM
> > Subject: Re: proton 0.10 blocker
> >
> > On 17 July 2015 at 23:32, Gordon Sim <gs...@redhat.com> wrote:
> > > On 07/17/2015 10:04 PM, Rafael Schloming wrote:
> > >>
> > >> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
> > >>
> > >>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
> > >>>
> > >>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com>
> wrote:
> > >>>>
> > >>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
> > >>>>>
> > >>>>>
> > >>>>>   Given this I believe the incref/decref pair is indeed running
> into
> > >>>>>>
> > >>>>>> problems
> > >>>>>> when being invoked from inside a finalizer. I'd be curious if an
> > >>>>>> alternative fix would work. I suspect you could replace the
> additional
> > >>>>>> conditions you added to the if predicate with this:
> > >>>>>>
> > >>>>>>      pn_refcount(endpoint) > 0
> > >>>>>>
> > >>>>>>
> > >>>>> If the refcount is *not* 0, what does the incref/decref sequence
> > >>>>> accomplish?
> > >>>>>
> > >>>>
> > >>>>
> > >>>> I believe the answer to this is the same as the answer I just
> posted on
> > >>>> the
> > >>>> other thread, i.e. the incref may trigger the incref hook (see
> > >>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint
> > >>>> state
> > >>>> and adjust the refcount accordingly. The decref may then end up
> > >>>> finalizing
> > >>>> the object.
> > >>>>
> > >>>
> > >>> Right, understood now.
> > >>>
> > >>> Unfortunately replacing the additional conditions with just that
> check on
> > >>> the refcount doesn't prevent the crash though.
> > >>>
> > >>
> > >> Doh, not the result I was hoping for. Does it yield the same stack
> trace
> > >> as
> > >> before?
> > >
> > >
> > > Yes
> > >
> >
> > Given that and all who looked thinking the earlier proposal was safe,
> > is it worth going with that change at least for now? Knocking things
> > off the blocker list and getting an RC (or even just a beta) out would
> > be really really good.
> >
> > Robbie
> >
>
> --
> -K
>

Re: proton 0.10 blocker

Posted by Ken Giusti <kg...@redhat.com>.
+1 for using Gordon's fix for now - we can cut a beta and see if it holds up.

Since there's some unanswered questions regarding the patch's behavior, I'll leave the bug open - drop the blocker status - and assign it over to Rafi.  He's better cerebrally equipped to understand the reference counting implementation than I certainly am.



----- Original Message -----
> From: "Robbie Gemmell" <ro...@gmail.com>
> To: dev@qpid.apache.org, proton@qpid.apache.org
> Sent: Monday, July 20, 2015 12:03:06 PM
> Subject: Re: proton 0.10 blocker
> 
> On 17 July 2015 at 23:32, Gordon Sim <gs...@redhat.com> wrote:
> > On 07/17/2015 10:04 PM, Rafael Schloming wrote:
> >>
> >> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
> >>
> >>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
> >>>
> >>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
> >>>>
> >>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
> >>>>>
> >>>>>
> >>>>>   Given this I believe the incref/decref pair is indeed running into
> >>>>>>
> >>>>>> problems
> >>>>>> when being invoked from inside a finalizer. I'd be curious if an
> >>>>>> alternative fix would work. I suspect you could replace the additional
> >>>>>> conditions you added to the if predicate with this:
> >>>>>>
> >>>>>>      pn_refcount(endpoint) > 0
> >>>>>>
> >>>>>>
> >>>>> If the refcount is *not* 0, what does the incref/decref sequence
> >>>>> accomplish?
> >>>>>
> >>>>
> >>>>
> >>>> I believe the answer to this is the same as the answer I just posted on
> >>>> the
> >>>> other thread, i.e. the incref may trigger the incref hook (see
> >>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint
> >>>> state
> >>>> and adjust the refcount accordingly. The decref may then end up
> >>>> finalizing
> >>>> the object.
> >>>>
> >>>
> >>> Right, understood now.
> >>>
> >>> Unfortunately replacing the additional conditions with just that check on
> >>> the refcount doesn't prevent the crash though.
> >>>
> >>
> >> Doh, not the result I was hoping for. Does it yield the same stack trace
> >> as
> >> before?
> >
> >
> > Yes
> >
> 
> Given that and all who looked thinking the earlier proposal was safe,
> is it worth going with that change at least for now? Knocking things
> off the blocker list and getting an RC (or even just a beta) out would
> be really really good.
> 
> Robbie
> 

-- 
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Ken Giusti <kg...@redhat.com>.
+1 for using Gordon's fix for now - we can cut a beta and see if it holds up.

Since there's some unanswered questions regarding the patch's behavior, I'll leave the bug open - drop the blocker status - and assign it over to Rafi.  He's better cerebrally equipped to understand the reference counting implementation than I certainly am.



----- Original Message -----
> From: "Robbie Gemmell" <ro...@gmail.com>
> To: dev@qpid.apache.org, proton@qpid.apache.org
> Sent: Monday, July 20, 2015 12:03:06 PM
> Subject: Re: proton 0.10 blocker
> 
> On 17 July 2015 at 23:32, Gordon Sim <gs...@redhat.com> wrote:
> > On 07/17/2015 10:04 PM, Rafael Schloming wrote:
> >>
> >> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
> >>
> >>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
> >>>
> >>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
> >>>>
> >>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
> >>>>>
> >>>>>
> >>>>>   Given this I believe the incref/decref pair is indeed running into
> >>>>>>
> >>>>>> problems
> >>>>>> when being invoked from inside a finalizer. I'd be curious if an
> >>>>>> alternative fix would work. I suspect you could replace the additional
> >>>>>> conditions you added to the if predicate with this:
> >>>>>>
> >>>>>>      pn_refcount(endpoint) > 0
> >>>>>>
> >>>>>>
> >>>>> If the refcount is *not* 0, what does the incref/decref sequence
> >>>>> accomplish?
> >>>>>
> >>>>
> >>>>
> >>>> I believe the answer to this is the same as the answer I just posted on
> >>>> the
> >>>> other thread, i.e. the incref may trigger the incref hook (see
> >>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint
> >>>> state
> >>>> and adjust the refcount accordingly. The decref may then end up
> >>>> finalizing
> >>>> the object.
> >>>>
> >>>
> >>> Right, understood now.
> >>>
> >>> Unfortunately replacing the additional conditions with just that check on
> >>> the refcount doesn't prevent the crash though.
> >>>
> >>
> >> Doh, not the result I was hoping for. Does it yield the same stack trace
> >> as
> >> before?
> >
> >
> > Yes
> >
> 
> Given that and all who looked thinking the earlier proposal was safe,
> is it worth going with that change at least for now? Knocking things
> off the blocker list and getting an RC (or even just a beta) out would
> be really really good.
> 
> Robbie
> 

-- 
-K

Re: proton 0.10 blocker

Posted by Robbie Gemmell <ro...@gmail.com>.
On 17 July 2015 at 23:32, Gordon Sim <gs...@redhat.com> wrote:
> On 07/17/2015 10:04 PM, Rafael Schloming wrote:
>>
>> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
>>
>>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
>>>
>>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
>>>>
>>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
>>>>>
>>>>>
>>>>>   Given this I believe the incref/decref pair is indeed running into
>>>>>>
>>>>>> problems
>>>>>> when being invoked from inside a finalizer. I'd be curious if an
>>>>>> alternative fix would work. I suspect you could replace the additional
>>>>>> conditions you added to the if predicate with this:
>>>>>>
>>>>>>      pn_refcount(endpoint) > 0
>>>>>>
>>>>>>
>>>>> If the refcount is *not* 0, what does the incref/decref sequence
>>>>> accomplish?
>>>>>
>>>>
>>>>
>>>> I believe the answer to this is the same as the answer I just posted on
>>>> the
>>>> other thread, i.e. the incref may trigger the incref hook (see
>>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint
>>>> state
>>>> and adjust the refcount accordingly. The decref may then end up
>>>> finalizing
>>>> the object.
>>>>
>>>
>>> Right, understood now.
>>>
>>> Unfortunately replacing the additional conditions with just that check on
>>> the refcount doesn't prevent the crash though.
>>>
>>
>> Doh, not the result I was hoping for. Does it yield the same stack trace
>> as
>> before?
>
>
> Yes
>

Given that and all who looked thinking the earlier proposal was safe,
is it worth going with that change at least for now? Knocking things
off the blocker list and getting an RC (or even just a beta) out would
be really really good.

Robbie

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Robbie Gemmell <ro...@gmail.com>.
On 17 July 2015 at 23:32, Gordon Sim <gs...@redhat.com> wrote:
> On 07/17/2015 10:04 PM, Rafael Schloming wrote:
>>
>> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
>>
>>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
>>>
>>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
>>>>
>>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
>>>>>
>>>>>
>>>>>   Given this I believe the incref/decref pair is indeed running into
>>>>>>
>>>>>> problems
>>>>>> when being invoked from inside a finalizer. I'd be curious if an
>>>>>> alternative fix would work. I suspect you could replace the additional
>>>>>> conditions you added to the if predicate with this:
>>>>>>
>>>>>>      pn_refcount(endpoint) > 0
>>>>>>
>>>>>>
>>>>> If the refcount is *not* 0, what does the incref/decref sequence
>>>>> accomplish?
>>>>>
>>>>
>>>>
>>>> I believe the answer to this is the same as the answer I just posted on
>>>> the
>>>> other thread, i.e. the incref may trigger the incref hook (see
>>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint
>>>> state
>>>> and adjust the refcount accordingly. The decref may then end up
>>>> finalizing
>>>> the object.
>>>>
>>>
>>> Right, understood now.
>>>
>>> Unfortunately replacing the additional conditions with just that check on
>>> the refcount doesn't prevent the crash though.
>>>
>>
>> Doh, not the result I was hoping for. Does it yield the same stack trace
>> as
>> before?
>
>
> Yes
>

Given that and all who looked thinking the earlier proposal was safe,
is it worth going with that change at least for now? Knocking things
off the blocker list and getting an RC (or even just a beta) out would
be really really good.

Robbie

Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 10:04 PM, Rafael Schloming wrote:
> On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:
>
>> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
>>
>>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
>>>
>>>   On 07/17/2015 07:17 PM, Rafael Schloming wrote:
>>>>
>>>>   Given this I believe the incref/decref pair is indeed running into
>>>>> problems
>>>>> when being invoked from inside a finalizer. I'd be curious if an
>>>>> alternative fix would work. I suspect you could replace the additional
>>>>> conditions you added to the if predicate with this:
>>>>>
>>>>>      pn_refcount(endpoint) > 0
>>>>>
>>>>>
>>>> If the refcount is *not* 0, what does the incref/decref sequence
>>>> accomplish?
>>>>
>>>
>>>
>>> I believe the answer to this is the same as the answer I just posted on
>>> the
>>> other thread, i.e. the incref may trigger the incref hook (see
>>> pn_xxx_incref in engine.c), and this in turn may update the endpoint state
>>> and adjust the refcount accordingly. The decref may then end up finalizing
>>> the object.
>>>
>>
>> Right, understood now.
>>
>> Unfortunately replacing the additional conditions with just that check on
>> the refcount doesn't prevent the crash though.
>>
>
> Doh, not the result I was hoping for. Does it yield the same stack trace as
> before?

Yes

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
On Fri, Jul 17, 2015 at 12:47 PM, Gordon Sim <gs...@redhat.com> wrote:

> On 07/17/2015 08:15 PM, Rafael Schloming wrote:
>
>> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
>>
>>  On 07/17/2015 07:17 PM, Rafael Schloming wrote:
>>>
>>>  Given this I believe the incref/decref pair is indeed running into
>>>> problems
>>>> when being invoked from inside a finalizer. I'd be curious if an
>>>> alternative fix would work. I suspect you could replace the additional
>>>> conditions you added to the if predicate with this:
>>>>
>>>>     pn_refcount(endpoint) > 0
>>>>
>>>>
>>> If the refcount is *not* 0, what does the incref/decref sequence
>>> accomplish?
>>>
>>
>>
>> I believe the answer to this is the same as the answer I just posted on
>> the
>> other thread, i.e. the incref may trigger the incref hook (see
>> pn_xxx_incref in engine.c), and this in turn may update the endpoint state
>> and adjust the refcount accordingly. The decref may then end up finalizing
>> the object.
>>
>
> Right, understood now.
>
> Unfortunately replacing the additional conditions with just that check on
> the refcount doesn't prevent the crash though.
>

Doh, not the result I was hoping for. Does it yield the same stack trace as
before?

--Rafael

Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 08:15 PM, Rafael Schloming wrote:
> On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:
>
>> On 07/17/2015 07:17 PM, Rafael Schloming wrote:
>>
>>> Given this I believe the incref/decref pair is indeed running into
>>> problems
>>> when being invoked from inside a finalizer. I'd be curious if an
>>> alternative fix would work. I suspect you could replace the additional
>>> conditions you added to the if predicate with this:
>>>
>>>     pn_refcount(endpoint) > 0
>>>
>>
>> If the refcount is *not* 0, what does the incref/decref sequence
>> accomplish?
>
>
> I believe the answer to this is the same as the answer I just posted on the
> other thread, i.e. the incref may trigger the incref hook (see
> pn_xxx_incref in engine.c), and this in turn may update the endpoint state
> and adjust the refcount accordingly. The decref may then end up finalizing
> the object.

Right, understood now.

Unfortunately replacing the additional conditions with just that check 
on the refcount doesn't prevent the crash though.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
On Fri, Jul 17, 2015 at 11:56 AM, Gordon Sim <gs...@redhat.com> wrote:

> On 07/17/2015 07:17 PM, Rafael Schloming wrote:
>
>> Given this I believe the incref/decref pair is indeed running into
>> problems
>> when being invoked from inside a finalizer. I'd be curious if an
>> alternative fix would work. I suspect you could replace the additional
>> conditions you added to the if predicate with this:
>>
>>    pn_refcount(endpoint) > 0
>>
>
> If the refcount is *not* 0, what does the incref/decref sequence
> accomplish?


I believe the answer to this is the same as the answer I just posted on the
other thread, i.e. the incref may trigger the incref hook (see
pn_xxx_incref in engine.c), and this in turn may update the endpoint state
and adjust the refcount accordingly. The decref may then end up finalizing
the object.

--Rafael

Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 07:17 PM, Rafael Schloming wrote:
> Given this I believe the incref/decref pair is indeed running into problems
> when being invoked from inside a finalizer. I'd be curious if an
> alternative fix would work. I suspect you could replace the additional
> conditions you added to the if predicate with this:
>
>    pn_refcount(endpoint) > 0

If the refcount is *not* 0, what does the incref/decref sequence accomplish?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
Thanks,

Given this I believe the incref/decref pair is indeed running into problems
when being invoked from inside a finalizer. I'd be curious if an
alternative fix would work. I suspect you could replace the additional
conditions you added to the if predicate with this:

  pn_refcount(endpoint) > 0

If that fix does work then it would confirm my theory and likely be a more
robust fix.

FWIW, pn_refcount(...) yielding zero implies that the object is currently
being finalized and so should avoid getting into trouble without any need
to write code that depends on an understanding of object specific state.

--Rafael

On Fri, Jul 17, 2015 at 10:12 AM, Gordon Sim <gs...@redhat.com> wrote:

> On 07/17/2015 05:36 PM, Rafael Schloming wrote:
>
>> Hi Gordon,
>>
>> I did my best to dump some useful info on the refcounting stuff in the
>> other thread. I also posted a comment on the review. As I said there it
>> would be helpful to see the stack trace from the crash in order to figure
>> out if the fix is merely a workaround.
>>
>
> Here is one:
>
>  #0  0x0000003d54635935 in raise () from /lib64/libc.so.6
>> #1  0x0000003d546370e8 in abort () from /lib64/libc.so.6
>> #2  0x0000003d5462e6a2 in __assert_fail_base () from /lib64/libc.so.6
>> #3  0x0000003d5462e752 in __assert_fail () from /lib64/libc.so.6
>> #4  0x00007fef7d37a761 in pn_class_free (clazz=0x7fef7d5b4ec0,
>> object=0x3692060) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:120
>> #5  0x00007fef7d37aa12 in pn_free (object=<optimized out>) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:265
>> #6  0x00007fef7d386000 in pni_free_children (children=0x3690510,
>> freed=0x36905e0) at
>> /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:454
>> #7  0x00007fef7d386d09 in pn_session_finalize (object=0x3690180) at
>> /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:932
>> #8  0x00007fef7d37a64d in pn_class_decref (clazz=clazz@entry=0x7fef7d5b4e40,
>> object=object@entry=0x3690180) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
>> #9  0x00007fef7d37a70b in pn_class_free (clazz=0x7fef7d5b4e40,
>> object=0x3690180) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:122
>> #10 0x00007fef7d37aa12 in pn_free (object=<optimized out>) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:265
>> #11 0x00007fef7d386000 in pni_free_children (children=0x368e9c0,
>> freed=0x368ea90) at
>> /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:454
>> #12 0x00007fef7d3866db in pn_connection_finalize (object=0x368e310) at
>> /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:476
>> #13 0x00007fef7d37a64d in pn_class_decref (clazz=0x7fef7d5b4dc0,
>> object=object@entry=0x368e310) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
>> #14 0x00007fef7d37a9d2 in pn_decref (object=object@entry=0x368e310) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:255
>> #15 0x00007fef7d38bd08 in pn_transport_unbind (transport=transport@entry=0x36831e0)
>> at /home/gordon/projects/proton-git/proton-c/src/transport/transport.c:733
>> #16 0x00007fef7d38bd88 in pn_transport_finalize (object=0x36831e0) at
>> /home/gordon/projects/proton-git/proton-c/src/transport/transport.c:602
>> #17 0x00007fef7d37a64d in pn_class_decref (clazz=0x7fef7d5b50c0,
>> object=0x36831e0) at
>> /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
>
>

Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 05:36 PM, Rafael Schloming wrote:
> Hi Gordon,
>
> I did my best to dump some useful info on the refcounting stuff in the
> other thread. I also posted a comment on the review. As I said there it
> would be helpful to see the stack trace from the crash in order to figure
> out if the fix is merely a workaround.

Here is one:

> #0  0x0000003d54635935 in raise () from /lib64/libc.so.6
> #1  0x0000003d546370e8 in abort () from /lib64/libc.so.6
> #2  0x0000003d5462e6a2 in __assert_fail_base () from /lib64/libc.so.6
> #3  0x0000003d5462e752 in __assert_fail () from /lib64/libc.so.6
> #4  0x00007fef7d37a761 in pn_class_free (clazz=0x7fef7d5b4ec0, object=0x3692060) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:120
> #5  0x00007fef7d37aa12 in pn_free (object=<optimized out>) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:265
> #6  0x00007fef7d386000 in pni_free_children (children=0x3690510, freed=0x36905e0) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:454
> #7  0x00007fef7d386d09 in pn_session_finalize (object=0x3690180) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:932
> #8  0x00007fef7d37a64d in pn_class_decref (clazz=clazz@entry=0x7fef7d5b4e40, object=object@entry=0x3690180) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
> #9  0x00007fef7d37a70b in pn_class_free (clazz=0x7fef7d5b4e40, object=0x3690180) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:122
> #10 0x00007fef7d37aa12 in pn_free (object=<optimized out>) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:265
> #11 0x00007fef7d386000 in pni_free_children (children=0x368e9c0, freed=0x368ea90) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:454
> #12 0x00007fef7d3866db in pn_connection_finalize (object=0x368e310) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:476
> #13 0x00007fef7d37a64d in pn_class_decref (clazz=0x7fef7d5b4dc0, object=object@entry=0x368e310) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
> #14 0x00007fef7d37a9d2 in pn_decref (object=object@entry=0x368e310) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:255
> #15 0x00007fef7d38bd08 in pn_transport_unbind (transport=transport@entry=0x36831e0) at /home/gordon/projects/proton-git/proton-c/src/transport/transport.c:733
> #16 0x00007fef7d38bd88 in pn_transport_finalize (object=0x36831e0) at /home/gordon/projects/proton-git/proton-c/src/transport/transport.c:602
> #17 0x00007fef7d37a64d in pn_class_decref (clazz=0x7fef7d5b50c0, object=0x36831e0) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:97




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: proton 0.10 blocker

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 05:36 PM, Rafael Schloming wrote:
> Hi Gordon,
>
> I did my best to dump some useful info on the refcounting stuff in the
> other thread. I also posted a comment on the review. As I said there it
> would be helpful to see the stack trace from the crash in order to figure
> out if the fix is merely a workaround.

Here is one:

> #0  0x0000003d54635935 in raise () from /lib64/libc.so.6
> #1  0x0000003d546370e8 in abort () from /lib64/libc.so.6
> #2  0x0000003d5462e6a2 in __assert_fail_base () from /lib64/libc.so.6
> #3  0x0000003d5462e752 in __assert_fail () from /lib64/libc.so.6
> #4  0x00007fef7d37a761 in pn_class_free (clazz=0x7fef7d5b4ec0, object=0x3692060) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:120
> #5  0x00007fef7d37aa12 in pn_free (object=<optimized out>) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:265
> #6  0x00007fef7d386000 in pni_free_children (children=0x3690510, freed=0x36905e0) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:454
> #7  0x00007fef7d386d09 in pn_session_finalize (object=0x3690180) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:932
> #8  0x00007fef7d37a64d in pn_class_decref (clazz=clazz@entry=0x7fef7d5b4e40, object=object@entry=0x3690180) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
> #9  0x00007fef7d37a70b in pn_class_free (clazz=0x7fef7d5b4e40, object=0x3690180) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:122
> #10 0x00007fef7d37aa12 in pn_free (object=<optimized out>) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:265
> #11 0x00007fef7d386000 in pni_free_children (children=0x368e9c0, freed=0x368ea90) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:454
> #12 0x00007fef7d3866db in pn_connection_finalize (object=0x368e310) at /home/gordon/projects/proton-git/proton-c/src/engine/engine.c:476
> #13 0x00007fef7d37a64d in pn_class_decref (clazz=0x7fef7d5b4dc0, object=object@entry=0x368e310) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:97
> #14 0x00007fef7d37a9d2 in pn_decref (object=object@entry=0x368e310) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:255
> #15 0x00007fef7d38bd08 in pn_transport_unbind (transport=transport@entry=0x36831e0) at /home/gordon/projects/proton-git/proton-c/src/transport/transport.c:733
> #16 0x00007fef7d38bd88 in pn_transport_finalize (object=0x36831e0) at /home/gordon/projects/proton-git/proton-c/src/transport/transport.c:602
> #17 0x00007fef7d37a64d in pn_class_decref (clazz=0x7fef7d5b50c0, object=0x36831e0) at /home/gordon/projects/proton-git/proton-c/src/object/object.c:97




Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
Hi Gordon,

I did my best to dump some useful info on the refcounting stuff in the
other thread. I also posted a comment on the review. As I said there it
would be helpful to see the stack trace from the crash in order to figure
out if the fix is merely a workaround.

--Rafael


On Wed, Jul 15, 2015 at 9:30 AM, Gordon Sim <gs...@redhat.com> wrote:

> The latest proton code is causing crashes in qpid-cpp tests that use it.
> I've tracked the problem down to the fix for PROTON-905[1] and proposed an
> enhancement to that fix, https://reviews.apache.org/r/36509/, which
> avoids the crash.
>
> Could someone who understands the logic controlling the lifecycle of
> pn_session_t and pn_link_t objects in detail review and approve this please?
>
>
> [1] https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=653f4e5
>

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 08:11 PM, Rafael Schloming wrote:
> On Fri, Jul 17, 2015 at 10:45 AM, Gordon Sim <gs...@redhat.com> wrote:
>
>> Still digesting the explanation (thanks!) but one follow up question:
>>
>> On 07/17/2015 04:37 PM, Rafael Schloming wrote:
>>
>>> it isn't actually possible to use the object when there refcount is 0.
>>>
>>
>> What is the purpose of the incref/decref pattern then, e.g. as used in
>> pn_session_free()? That is designed to trigger the finalizer, right? If so
>> it would seem that can only happen if the reference count is 0 at the point
>> when pn_session_free() is called.
>>
>> (And if it was valid to call pn_session_free for a given session, then
>> that session would have been valid for use before that, no?)
>>
>
> The refcount should never be zero going into pn_session_free. The situation
> where it triggers the finalizer is when the refcount is 1 upon entering
> pn_session_free, but the referenced boolean is false meaning that the one
> remaining reference that exists is the parent -> child pointer (in this
> case the connection -> session pointer). The incref triggers the
> pn_session_incref hook which flips the reference around so that the session
> -> connection pointer is now the reference and the connection -> session
> pointer is the borrowed reference.

Ah, I see, I'd missed that hook. So the pn_incref(session) call actually 
increments the reference of the *connection*, not the session in this 
case. The pn_decref(session) then reduces the refcount of the session 
from 1 to 0 which triggers the finalizer.

Thanks! I'm slowly understanding how it works a little more.


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Rafael Schloming <rh...@alum.mit.edu>.
On Fri, Jul 17, 2015 at 10:45 AM, Gordon Sim <gs...@redhat.com> wrote:

> Still digesting the explanation (thanks!) but one follow up question:
>
> On 07/17/2015 04:37 PM, Rafael Schloming wrote:
>
>> it isn't actually possible to use the object when there refcount is 0.
>>
>
> What is the purpose of the incref/decref pattern then, e.g. as used in
> pn_session_free()? That is designed to trigger the finalizer, right? If so
> it would seem that can only happen if the reference count is 0 at the point
> when pn_session_free() is called.
>
> (And if it was valid to call pn_session_free for a given session, then
> that session would have been valid for use before that, no?)
>

The refcount should never be zero going into pn_session_free. The situation
where it triggers the finalizer is when the refcount is 1 upon entering
pn_session_free, but the referenced boolean is false meaning that the one
remaining reference that exists is the parent -> child pointer (in this
case the connection -> session pointer). The incref triggers the
pn_session_incref hook which flips the reference around so that the session
-> connection pointer is now the reference and the connection -> session
pointer is the borrowed reference. Logically this momentarily increases the
refcount from 1 to 2 and then the hook decreases it back to 1 and the
decref triggers the finalizer.

Now if pn_session_free is called when the refcount is > 1, or if
pn_ep_decref ends up posting events (events create references to endpoint
objects and thereby increase the refcount), then the pn_incref/decref ends
up being a noop and the session is simply marked as free but finalized when
the last reference to it is decref'ed.

--Rafael

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
Still digesting the explanation (thanks!) but one follow up question:

On 07/17/2015 04:37 PM, Rafael Schloming wrote:
> it isn't actually possible to use the object when there refcount is 0.

What is the purpose of the incref/decref pattern then, e.g. as used in 
pn_session_free()? That is designed to trigger the finalizer, right? If 
so it would seem that can only happen if the reference count is 0 at the 
point when pn_session_free() is called.

(And if it was valid to call pn_session_free for a given session, then 
that session would have been valid for use before that, no?)



Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
On 07/17/2015 04:37 PM, Rafael Schloming wrote:
> Are they really returned with a ref count of 0?

No, indeed they are not. I missed the fact that the finalizer will run 
before the object is returned to the application.

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Rafael Schloming <rh...@alum.mit.edu>.
On Thu, Jul 16, 2015 at 7:46 AM, Gordon Sim <gs...@redhat.com> wrote:

> On 07/16/2015 02:40 PM, aconway wrote:
>
>> Can someone who understand the proton use of refcounts please add some
>> doc comments to explain the semantics? Apologies if this is already
>> there and I missed it, tell me to RTFM.
>>
>
> I'm not entirely sure I understand it. However having spent a couple of
> days reading and puzzling over the code, I'll try and offer some answers
> where I think I can, and add some questions of my own.
>
>  For comparison here is what "refcount" traditionally means ("tradition"
>> includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
>> saying we have to make proton conform, but we should document how it
>> differs to save confusion.
>>
>> 1. A refcount is a count of the number of references (owning pointers)
>> to an object.
>>
>
> Yes, that broadly speaking is the purpose in proton. However this was a
> recent addition and it is not used uniformly inside the library itself. Not
> only that but the old api, where the application (generally) owned the
> pointers it created until it called pn_xxx_free, is still supported
> alongside the newer mode of use, as e.g. employed in the python binding,
> where the application uses only incref and decref.
>
>  2. Objects are created with refcuont=1, the creator owns the first
>> reference.
>>
>
> This is not always the case and was one of the points I found surprising.
> Though a newly created 'object' does indeed have its ref count set to 1,
> for pn_session() and pn_link(), which are the functions actually used by
> the application to created new sessions or links, the implementation of
> those functions includes an explicit decref, meaning the objects are
> returned with a ref count of 0.
>

Are they really returned with a ref count of 0? I don't think proton
objects can actually exist with a refcount of 0 outside a finalizer. What
should actually happen is that the finalizer for the newly created session
will run and cause the parent of the session or link (the connection or
session) to inspect the child's state. Based on that state it may create a
new reference to the child, e.g. if there is outstanding work on the wire
to be done for that session or link. In the case of a new session or link
this is always the case, so it will always end up creating a new reference
to it, but this should never result in a child with a non zero refcount
being returned or visible in any way to user code.

I suspect this was done to simplify things for the newer mode of use, where
> a wrapper object can always be instantiated and it simply increments the
> ref count. Would be good to have that confirmed though, as to me this is
> the source of confusion and complexity.
>

Yes, this is certainly true, but it is also to accommodate the memory model
the C interface exposes. In the C interface sessions and links are owned by
their parent objects, e.g. freeing the connection frees all the contained
sessions. The way this is accommodated is that the parent object is what
owns the reference to the child by default. What is returned from the
pn_session()/pn_link() calls is a borrowed reference. (This changes when
you do an incref, see below for more details.)


> However this means that in the old mode of use, e.g.
> pn_session()/pn_session_free(), the object may have a refcount of 0 even
> though it has not been freed by the application.
>

As mentioned above, this should never be able to happen outside of a
finalizer. Perhaps what is confusing here is that the finalizer can create
a new reference to the object that is being finalized thereby causing the
pn_decref() to *appear* to be a noop, however what is really happening is
an important state change, the last reference is being deleted, and the
finalizer has decided to create a new reference for other purposes (e.g.
because there is outstanding transport work to be done with that object or
because it is going into a pool) and the state of the object (or related
objects/state) is being changed in key ways to reflect this.

Note that this pattern is not at all unique to proton's refcount system.
All GC systems that have finalizers accommodate these sorts of semantics,
i.e. finalizers causing objects to become live again. This is fundamental
to finalizers since they are just user code and can do whatever they want,
including create new references.

What is confusing about it here is more to do with how this capability is
being used to interact with all the engine data structures that predate
both the refcount system and the collection objects that make use of the
refcount system. I believe ultimately the engine data structures should be
reworked to use the various collection objects now available, at which
point a lot of this will become much more centralized, self contained, and
understandable and likely won't require so much magic (or at least the
magic will be in one place rather than spread around like it is now).


>
>  3. If another owning reference is created, the refcount MUST BE
>> incremented.
>>
>
> This is not currently the case for all internal use which necessitates
> some extra logic and checks.
>

I might be about to restate what you are saying in a different way, but the
refcount is actually incremented in all cases. What is tricky (and this is
admittedly the one part of all this that is unique to proton's refcount
system), is that proton's refcount system provides a hook for invoking
object-specific code whenever a reference is incremented. The session and
link objects override this hook and based on their state they will
sometimes remove the reference their parent is keeping to them. When this
happens it appears that the incref has been a noop if you only examine the
refcount, however once again there are key state changes and what has
really happened is that the refcount has increased momentarily and then
been decreased again by the hooks.


>
>  4. The owner of a reference MUST decrease the refcount on dropping the
>> reference and MUST NOT use the pointer after refcount is decremented.
>> It no longer "owns" a reference.
>>
>> 5. The object MAY be deleted when refcount goes to 0 or MAY be pooled
>> for re-use but it MUST NOT be used by any code (other than the pooling
>> system if there is one)
>>
>
> As above, that is currently not completely true at least for sessions and
> links, where they may be in use while still having a 0 ref count.
>

Hopefully this should be clear by now, but it isn't actually possible to
use the object when there refcount is 0. I think Alan's statement (5) needs
to be modified to include finalizers in general.


>
>  6. You never examine the refcount (except for debugging) Either you own
>> a reference, which means refcount > 0, or you don't, which means you
>> MUST NOT touch the object, even to examine its refcount.
>>
>
> My questions:
>
> * what are the respective roles of the ref count in pn_endpoint_t, and in
> the pni_head_t record associated with pn_session_t and pn_link_t? i.e. why
> are there two separate ref counts for the same object?
>

These are independent mechanisms and logically unrelated although probably
often correlated in practice. The one in pn_head_t is used purely to track
memory ownership. The one in pn_endpoint_t is used to figure out when it is
safe for a given endpoint to emit it's FINAL event.


>
> * pn_link_new() and pn_session() both decrement the count (in the
> pni_head_t record) for the newly created objects. Why?
>
> * what exactly does the 'referenced' flag indicate?
>
> * when a finalize function is exited early due to the return value of
> pni_preserve_child, the reference count is incremented to prevent that
> object being freed by the code in pn_class_decref. Where/how is that then
> decremented again to retrigger the finalizer?
>

I'm going to try to answer all three of the above at once as they are
related. This is all related to the memory management of Connections,
Sessions, Links, and Deliveries, so it helps to have a picture of the
relevant object model in your head:

           1   n       1   n    1   n
 Connection-----Session-----Link-----Delivery

While we generally think of this as a tree, i.e. Deliveries are children of
Links are children of Sessions are children of Connections, because you can
navigate in both directions, from a memory management perspective this is
actually a graph.

Terminology-wise it is also helpful to keep in mind the difference between
*references* (i.e. a pointer for which there is a corresponding incref) and
*borrowed references*. A borrowed reference is a pointer which does not
have a corresponding incref and depends on the fact that another reference
exists and is only valid while that other reference continues to exist.

With that in mind there are several related problems here that are being
solved:

1) The memory model for C style usage (i.e. the original engine memory
model) only has references pointing from parent to child. This means that
when a parent is freed, all references the application may have retrieved
to any children become invalid since they depend on the borrowed parent
reference. Maintaining compatibility with this model is one key element of
the overall problem.

2) Exposing the engine objects via the python binding requires the engine
objects to match python's memory model, and the C semantics described in
(1) are incompatible with this. In particular, wrapping the borrowed child
reference results in dangling pointers if/when the last reference to the
parent goes away. The net of this is that mapping the engine object model
into python (or any other high level language) effectively requires that
the engine objects support full GC semantics, i.e. child objects can keep
parents alive, parent objects can keep children alive, etc.

3) Because this is all being maintained with reference counting, we need to
avoid reference cycles.

The scheme in place that addresses all three of these is most easily
described in terms of the generic parent/child relationship, however it is
(to some degree) implemented at each level of the tree.

Between any given child and parent, there are always pointers going in both
directions, i.e. a pointer from parent to child and a pointer from child to
parent, however at any given time, only one of those pointers is a
reference (making the other pointer a borrowed reference). Under specific
conditions these two will swap, i.e. the borrowed reference becomes a
reference and the reference becomes a borrowed reference. While this
results in a net change of zero in the reference count, it has a
significant impact on how and when the decref that ultimately collects a
given object occurs.

This scheme is really at the heart of the solution and is what allows
parents to keep children alive, and children to keep parents alive without
having pointer cycles. To understand the details, it's simply necessary to
understand the circumstances under which the reference semantics between
child and parent swap. There are two state variables that control this. The
"freed" boolean which tracks whether pn_xxx_free() has been called on a
given child, as well as the "referenced" boolean which tracks the
directionality of the reference between parent and child. The referenced
boolean is true when an objects parent pointer is a reference, and false
when it is not.

When a given object is externally referenced, e.g. for a
session/link/delivery that is wrapped by a python object that increfed it,
or indeed if it was simply manually increfed via the C API, the incref hook
flips the reference semantics such that the child object holds a reference
to the parent. When the last reference for an object goes away and the
object is being finalized, the finalizer for the child examines the
freed/reference fields and if freed is not true, i.e. the object has not
been freed, then rather than letting the object be collected, it creates a
reference from its parent to itself so long as the parent is live.

The above behavior results in something that solves the 3 problems. For
normal C style usage it behaves as the original C model behaves with the
only references being from parent to child and all child references being
borrowed. As soon as you incref a child, it will flip the directionality of
the parent/child reference and you get the behavior you would expect from a
typical fully GC'ed object graph.


>
> * when should pn_class_free be used v pn_class_decref? (The former does
> not check whether the finalizer increases the reference, the latter does)
>

In general you should only call pn_free or pn_class_free when you allocated
the object and you know it's not possible for there to be any other
references, and you want to assert that is the case. In general pn_free and
pn_class_free should be fully equivalent to asserting the refcount is one
and then calling decref and asserting that the object was actually freed.
The only exception to this is objects that don't support refcounts (see
below).


>
> * pn_class_free() has the precondition that the object passed in should
> have a refcount of 1 (which I understand) or -1. When and why would the
> refcount be -1?
>

A proton class is basically just a collection of function pointers to be
used with a certain kind of void * pointer. Those function pointers include
things like being able to pretty print the contents of the pointer, but
also things like incref/decref. Not all pointers support the notion of
incref/decref and a refcount of -1 is used to signal this. The intent here
is to be able to pass around proton objects, python objects, and even dumb
void * pointers in a uniform way.

--Rafael

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
On 07/16/2015 02:40 PM, aconway wrote:
> Can someone who understand the proton use of refcounts please add some
> doc comments to explain the semantics? Apologies if this is already
> there and I missed it, tell me to RTFM.

I'm not entirely sure I understand it. However having spent a couple of 
days reading and puzzling over the code, I'll try and offer some answers 
where I think I can, and add some questions of my own.

> For comparison here is what "refcount" traditionally means ("tradition"
> includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
> saying we have to make proton conform, but we should document how it
> differs to save confusion.
>
> 1. A refcount is a count of the number of references (owning pointers)
> to an object.

Yes, that broadly speaking is the purpose in proton. However this was a 
recent addition and it is not used uniformly inside the library itself. 
Not only that but the old api, where the application (generally) owned 
the pointers it created until it called pn_xxx_free, is still supported 
alongside the newer mode of use, as e.g. employed in the python binding, 
where the application uses only incref and decref.

> 2. Objects are created with refcuont=1, the creator owns the first
> reference.

This is not always the case and was one of the points I found 
surprising. Though a newly created 'object' does indeed have its ref 
count set to 1, for pn_session() and pn_link(), which are the functions 
actually used by the application to created new sessions or links, the 
implementation of those functions includes an explicit decref, meaning 
the objects are returned with a ref count of 0.

I suspect this was done to simplify things for the newer mode of use, 
where a wrapper object can always be instantiated and it simply 
increments the ref count. Would be good to have that confirmed though, 
as to me this is the source of confusion and complexity.

However this means that in the old mode of use, e.g. 
pn_session()/pn_session_free(), the object may have a refcount of 0 even 
though it has not been freed by the application.

> 3. If another owning reference is created, the refcount MUST BE
> incremented.

This is not currently the case for all internal use which necessitates 
some extra logic and checks.

> 4. The owner of a reference MUST decrease the refcount on dropping the
> reference and MUST NOT use the pointer after refcount is decremented.
> It no longer "owns" a reference.
>
> 5. The object MAY be deleted when refcount goes to 0 or MAY be pooled
> for re-use but it MUST NOT be used by any code (other than the pooling
> system if there is one)

As above, that is currently not completely true at least for sessions 
and links, where they may be in use while still having a 0 ref count.

> 6. You never examine the refcount (except for debugging) Either you own
> a reference, which means refcount > 0, or you don't, which means you
> MUST NOT touch the object, even to examine its refcount.

My questions:

* what are the respective roles of the ref count in pn_endpoint_t, and 
in the pni_head_t record associated with pn_session_t and pn_link_t? 
i.e. why are there two separate ref counts for the same object?

* pn_link_new() and pn_session() both decrement the count (in the 
pni_head_t record) for the newly created objects. Why?

* what exactly does the 'referenced' flag indicate?

* when a finalize function is exited early due to the return value of 
pni_preserve_child, the reference count is incremented to prevent that 
object being freed by the code in pn_class_decref. Where/how is that 
then decremented again to retrigger the finalizer?

* when should pn_class_free be used v pn_class_decref? (The former does 
not check whether the finalizer increases the reference, the latter does)

* pn_class_free() has the precondition that the object passed in should 
have a refcount of 1 (which I understand) or -1. When and why would the 
refcount be -1?

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
On 07/16/2015 02:40 PM, aconway wrote:
> Can someone who understand the proton use of refcounts please add some
> doc comments to explain the semantics? Apologies if this is already
> there and I missed it, tell me to RTFM.

I'm not entirely sure I understand it. However having spent a couple of 
days reading and puzzling over the code, I'll try and offer some answers 
where I think I can, and add some questions of my own.

> For comparison here is what "refcount" traditionally means ("tradition"
> includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
> saying we have to make proton conform, but we should document how it
> differs to save confusion.
>
> 1. A refcount is a count of the number of references (owning pointers)
> to an object.

Yes, that broadly speaking is the purpose in proton. However this was a 
recent addition and it is not used uniformly inside the library itself. 
Not only that but the old api, where the application (generally) owned 
the pointers it created until it called pn_xxx_free, is still supported 
alongside the newer mode of use, as e.g. employed in the python binding, 
where the application uses only incref and decref.

> 2. Objects are created with refcuont=1, the creator owns the first
> reference.

This is not always the case and was one of the points I found 
surprising. Though a newly created 'object' does indeed have its ref 
count set to 1, for pn_session() and pn_link(), which are the functions 
actually used by the application to created new sessions or links, the 
implementation of those functions includes an explicit decref, meaning 
the objects are returned with a ref count of 0.

I suspect this was done to simplify things for the newer mode of use, 
where a wrapper object can always be instantiated and it simply 
increments the ref count. Would be good to have that confirmed though, 
as to me this is the source of confusion and complexity.

However this means that in the old mode of use, e.g. 
pn_session()/pn_session_free(), the object may have a refcount of 0 even 
though it has not been freed by the application.

> 3. If another owning reference is created, the refcount MUST BE
> incremented.

This is not currently the case for all internal use which necessitates 
some extra logic and checks.

> 4. The owner of a reference MUST decrease the refcount on dropping the
> reference and MUST NOT use the pointer after refcount is decremented.
> It no longer "owns" a reference.
>
> 5. The object MAY be deleted when refcount goes to 0 or MAY be pooled
> for re-use but it MUST NOT be used by any code (other than the pooling
> system if there is one)

As above, that is currently not completely true at least for sessions 
and links, where they may be in use while still having a 0 ref count.

> 6. You never examine the refcount (except for debugging) Either you own
> a reference, which means refcount > 0, or you don't, which means you
> MUST NOT touch the object, even to examine its refcount.

My questions:

* what are the respective roles of the ref count in pn_endpoint_t, and 
in the pni_head_t record associated with pn_session_t and pn_link_t? 
i.e. why are there two separate ref counts for the same object?

* pn_link_new() and pn_session() both decrement the count (in the 
pni_head_t record) for the newly created objects. Why?

* what exactly does the 'referenced' flag indicate?

* when a finalize function is exited early due to the return value of 
pni_preserve_child, the reference count is incremented to prevent that 
object being freed by the code in pn_class_decref. Where/how is that 
then decremented again to retrigger the finalizer?

* when should pn_class_free be used v pn_class_decref? (The former does 
not check whether the finalizer increases the reference, the latter does)

* pn_class_free() has the precondition that the object passed in should 
have a refcount of 1 (which I understand) or -1. When and why would the 
refcount be -1?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by aconway <ac...@redhat.com>.
On Thu, 2015-07-16 at 15:11 +0100, Gordon Sim wrote:
> On 07/16/2015 02:40 PM, aconway wrote:
> > The fix mentioned above has this, which make no sense under 
> > traditional
> > refcounting:
> > 
> >          pn_incref(endpoint);
> >          pn_decref(endpoint);
> 
> Note that this is not added as part of my fix, it is already there. 
> [snip]
> 
> The reference counting logic may not match the ideal, but we can't 
> postpone a fix for the current issue pending some nicer overall 
> solution. We can avoid it, by backing out the problematic previous 
> commit, or we can adjust that commit.
> 

+1, I'm just proposing that we get a clear statement of the intended
semantics of the *existing* proton refcounting scheme into the code as
it has tripped up a couple of people so far. I for one am still not
very clear on it. This is not an issue for 0.10 but for longer term
code health.

Cheers,
Alan.

Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by aconway <ac...@redhat.com>.
On Thu, 2015-07-16 at 15:11 +0100, Gordon Sim wrote:
> On 07/16/2015 02:40 PM, aconway wrote:
> > The fix mentioned above has this, which make no sense under 
> > traditional
> > refcounting:
> > 
> >          pn_incref(endpoint);
> >          pn_decref(endpoint);
> 
> Note that this is not added as part of my fix, it is already there. 
> [snip]
> 
> The reference counting logic may not match the ideal, but we can't 
> postpone a fix for the current issue pending some nicer overall 
> solution. We can avoid it, by backing out the problematic previous 
> commit, or we can adjust that commit.
> 

+1, I'm just proposing that we get a clear statement of the intended
semantics of the *existing* proton refcounting scheme into the code as
it has tripped up a couple of people so far. I for one am still not
very clear on it. This is not an issue for 0.10 but for longer term
code health.

Cheers,
Alan.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
On 07/16/2015 02:40 PM, aconway wrote:
> The fix mentioned above has this, which make no sense under traditional
> refcounting:
>
>          pn_incref(endpoint);
>          pn_decref(endpoint);

Note that this is not added as part of my fix, it is already there. The 
simple explanation is that it is a trick to get a finalize function for 
the object in question to be re-run (when it was previously cut short). 
The broader context is that the reference counting is not used 
everywhere internally (it was a more recent addition), causing some 
additional twists and turns in the logic.


> Traditionally, this sequence is a no-op if the caller owns the
> reference and is illegal if not. We need a clear statement of the
> semantics of such sequences: when it it legal and what does it mean?
>  From the fix it seems to be related to the "freed" and "referenced"
> members, what exactly is the relationship?

The 'freed' flag indicates that the pn_xxx_free() call was made. This is 
not always the case. There is a mode of use, e.g. in the python binding, 
that use inc-/dec- ref instead. I'm not as clear on the design behind 
the 'referenced' flag, but it is set to false when the finalizer is cut 
short.

The original fix for PROTON-905 triggered the re-run of the finalizer 
whenever the reference count was 0 and a session or link was moved out 
of the set of modified items. From the description it was only intended 
to do this where the object in question had been freed and had had its 
finalizer cut short. Hence the addition in my proposed fix of that extra 
qualification.

As it is, the code currently on master, pending the 0.10 release is 
*incompatible* with qpid-cpp. So either the original fix needs to be 
reverted or it needs to be corrected in some way (e.g. along the lines I 
have suggested).

The reference counting logic may not match the ideal, but we can't 
postpone a fix for the current issue pending some nicer overall 
solution. We can avoid it, by backing out the problematic previous 
commit, or we can adjust that commit.


Re: Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by Gordon Sim <gs...@redhat.com>.
On 07/16/2015 02:40 PM, aconway wrote:
> The fix mentioned above has this, which make no sense under traditional
> refcounting:
>
>          pn_incref(endpoint);
>          pn_decref(endpoint);

Note that this is not added as part of my fix, it is already there. The 
simple explanation is that it is a trick to get a finalize function for 
the object in question to be re-run (when it was previously cut short). 
The broader context is that the reference counting is not used 
everywhere internally (it was a more recent addition), causing some 
additional twists and turns in the logic.


> Traditionally, this sequence is a no-op if the caller owns the
> reference and is illegal if not. We need a clear statement of the
> semantics of such sequences: when it it legal and what does it mean?
>  From the fix it seems to be related to the "freed" and "referenced"
> members, what exactly is the relationship?

The 'freed' flag indicates that the pn_xxx_free() call was made. This is 
not always the case. There is a mode of use, e.g. in the python binding, 
that use inc-/dec- ref instead. I'm not as clear on the design behind 
the 'referenced' flag, but it is set to false when the finalizer is cut 
short.

The original fix for PROTON-905 triggered the re-run of the finalizer 
whenever the reference count was 0 and a session or link was moved out 
of the set of modified items. From the description it was only intended 
to do this where the object in question had been freed and had had its 
finalizer cut short. Hence the addition in my proposed fix of that extra 
qualification.

As it is, the code currently on master, pending the 0.10 release is 
*incompatible* with qpid-cpp. So either the original fix needs to be 
reverted or it needs to be corrected in some way (e.g. along the lines I 
have suggested).

The reference counting logic may not match the ideal, but we can't 
postpone a fix for the current issue pending some nicer overall 
solution. We can avoid it, by backing out the problematic previous 
commit, or we can adjust that commit.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by aconway <ac...@redhat.com>.
On Wed, 2015-07-15 at 17:30 +0100, Gordon Sim wrote:
> The latest proton code is causing crashes in qpid-cpp tests that use 
> it. 
> I've tracked the problem down to the fix for PROTON-905[1] and 
> proposed 
> an enhancement to that fix, https://reviews.apache.org/r/36509/, 
> which 
> avoids the crash.
> 
> Could someone who understands the logic controlling the lifecycle of 
> pn_session_t and pn_link_t objects in detail review and approve this 
> please?

Can someone who understand the proton use of refcounts please add some
doc comments to explain the semantics? Apologies if this is already
there and I missed it, tell me to RTFM.

For comparison here is what "refcount" traditionally means ("tradition"
includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
saying we have to make proton conform, but we should document how it
differs to save confusion.

1. A refcount is a count of the number of references (owning pointers)
to an object.

2. Objects are created with refcuont=1, the creator owns the first
reference.

3. If another owning reference is created, the refcount MUST BE
incremented.

4. The owner of a reference MUST decrease the refcount on dropping the
reference and MUST NOT use the pointer after refcount is decremented.
It no longer "owns" a reference.

5. The object MAY be deleted when refcount goes to 0 or MAY be pooled
for re-use but it MUST NOT be used by any code (other than the pooling
system if there is one)

6. You never examine the refcount (except for debugging) Either you own
a reference, which means refcount > 0, or you don't, which means you
MUST NOT touch the object, even to examine its refcount.

The fix mentioned above has this, which make no sense under traditional
refcounting:

        pn_incref(endpoint);
        pn_decref(endpoint);

Traditionally, this sequence is a no-op if the caller owns the
reference and is illegal if not. We need a clear statement of the
semantics of such sequences: when it it legal and what does it mean?
>From the fix it seems to be related to the "freed" and "referenced"
members, what exactly is the relationship?

Cheers,
Alan.



Re: proton 0.10 blocker

Posted by Rafael Schloming <rh...@alum.mit.edu>.
Hi Gordon,

I did my best to dump some useful info on the refcounting stuff in the
other thread. I also posted a comment on the review. As I said there it
would be helpful to see the stack trace from the crash in order to figure
out if the fix is merely a workaround.

--Rafael


On Wed, Jul 15, 2015 at 9:30 AM, Gordon Sim <gs...@redhat.com> wrote:

> The latest proton code is causing crashes in qpid-cpp tests that use it.
> I've tracked the problem down to the fix for PROTON-905[1] and proposed an
> enhancement to that fix, https://reviews.apache.org/r/36509/, which
> avoids the crash.
>
> Could someone who understands the logic controlling the lifecycle of
> pn_session_t and pn_link_t objects in detail review and approve this please?
>
>
> [1] https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=653f4e5
>

Semantics of proton refcounts [was Re: proton 0.10 blocker]

Posted by aconway <ac...@redhat.com>.
On Wed, 2015-07-15 at 17:30 +0100, Gordon Sim wrote:
> The latest proton code is causing crashes in qpid-cpp tests that use 
> it. 
> I've tracked the problem down to the fix for PROTON-905[1] and 
> proposed 
> an enhancement to that fix, https://reviews.apache.org/r/36509/, 
> which 
> avoids the crash.
> 
> Could someone who understands the logic controlling the lifecycle of 
> pn_session_t and pn_link_t objects in detail review and approve this 
> please?

Can someone who understand the proton use of refcounts please add some
doc comments to explain the semantics? Apologies if this is already
there and I missed it, tell me to RTFM.

For comparison here is what "refcount" traditionally means ("tradition"
includes CORBA, boost::intrusive_ptr, std::shared_ptr etc.) I'm not
saying we have to make proton conform, but we should document how it
differs to save confusion.

1. A refcount is a count of the number of references (owning pointers)
to an object.

2. Objects are created with refcuont=1, the creator owns the first
reference.

3. If another owning reference is created, the refcount MUST BE
incremented.

4. The owner of a reference MUST decrease the refcount on dropping the
reference and MUST NOT use the pointer after refcount is decremented.
It no longer "owns" a reference.

5. The object MAY be deleted when refcount goes to 0 or MAY be pooled
for re-use but it MUST NOT be used by any code (other than the pooling
system if there is one)

6. You never examine the refcount (except for debugging) Either you own
a reference, which means refcount > 0, or you don't, which means you
MUST NOT touch the object, even to examine its refcount.

The fix mentioned above has this, which make no sense under traditional
refcounting:

        pn_incref(endpoint);
        pn_decref(endpoint);

Traditionally, this sequence is a no-op if the caller owns the
reference and is illegal if not. We need a clear statement of the
semantics of such sequences: when it it legal and what does it mean?
>From the fix it seems to be related to the "freed" and "referenced"
members, what exactly is the relationship?

Cheers,
Alan.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org