You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Robbie Gemmell <ro...@gmail.com> on 2012/03/11 19:10:14 UTC

C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Hi all,

Just thought I'd post this here to make sure it gets seen, I know many
people filter the JIRA traffic.

I was taking a quick look at why there has been an notable increase in
failures of the test job Keith had set up to run against the C++
broker on the ASF Jenkins instances. It seems one test in particular
(which I added earlier last year to verify a defect fix for the Java
broker) has been sporadically failing over the last week or so.
Investigating the logs for the last failure suggest the C++ broker
segfaulted during the test run, which lead on to the eventual test
failure report.

I have raised the following JIRA and attached the latest test log to
it as they dont get kept long. I have marked it as criticial for now,
although as it appears to affect the 0.16 branch too I'd actually
probably consider it a blocker at this point (but since theres no
chance of me being able to fix it, I figured I would let those with a
clue about the C++ broker make that decision).

https://issues.apache.org/jira/browse/QPID-3893
C++ broker appears to segfault during MultipleTransactedBatchProducerTest


Robbie

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Rajith Attapattu <ra...@gmail.com>.
Thanks for the info Robbie.
I've forgotten that I had access or even requested it :D

Btw Gordon has managed to reproduce it. So we are good for the time being.
Thanks again for brining this to our notice.

Rajtih

On Mon, Mar 12, 2012 at 10:23 AM, Robbie Gemmell
<ro...@gmail.com>wrote:

> I had a look but couldnt see anything, though I may jsut not know
> where to look (I presumed CWD which should be the main java dir).
>
> We dont have shell access to the boxes, only the ability to configure
> jobs via Jenkins (which you can then shell script to do most things
> you want). Keith, Rajith, Andrew K and myself are the only ones I know
> have requested such access (via Carl).
>
> Anyone can browse the workspace used by the tests through the Jenkins
> web ui, it can be found at:
> https://builds.apache.org/view/M-R/view/Qpid/job/Qpid-Java-Cpp-Test/ws/
>
> The test run is tied to the various Ubuntu nodes, yes.
>
> Robbie
>
> On 12 March 2012 13:52, Rajith Attapattu <ra...@gmail.com> wrote:
> > Robbie,
> >
> > Would you happen to know if we can have a quick look in our dir space to
> see
> > if there are any core files.
> > Do you have access? could either myself or Gordon get the user/pass to
> have
> > a look ?
> >
> > Regards,
> >
> > Rajith
> >
> > P.S These are machines run ubuntu ?
> >
> >
> > On Mon, Mar 12, 2012 at 9:34 AM, Gordon Sim <gs...@redhat.com> wrote:
> >>
> >> On 03/12/2012 09:46 AM, Gordon Sim wrote:
> >>>
> >>> On 03/11/2012 06:10 PM, Robbie Gemmell wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Just thought I'd post this here to make sure it gets seen, I know many
> >>>> people filter the JIRA traffic.
> >>>>
> >>>> I was taking a quick look at why there has been an notable increase in
> >>>> failures of the test job Keith had set up to run against the C++
> >>>> broker on the ASF Jenkins instances. It seems one test in particular
> >>>> (which I added earlier last year to verify a defect fix for the Java
> >>>> broker) has been sporadically failing over the last week or so.
> >>>> Investigating the logs for the last failure suggest the C++ broker
> >>>> segfaulted during the test run, which lead on to the eventual test
> >>>> failure report.
> >>>>
> >>>> I have raised the following JIRA and attached the latest test log to
> >>>> it as they dont get kept long. I have marked it as criticial for now,
> >>>> although as it appears to affect the 0.16 branch too I'd actually
> >>>> probably consider it a blocker at this point (but since theres no
> >>>> chance of me being able to fix it, I figured I would let those with a
> >>>> clue about the C++ broker make that decision).
> >>>>
> >>>> https://issues.apache.org/jira/browse/QPID-3893
> >>>> C++ broker appears to segfault during
> >>>> MultipleTransactedBatchProducerTest
> >>>
> >>>
> >>> I wasn't able to trigger a failure locally; will look for a box with
> >>> more processing available as I suspect the level of concurrency may be
> a
> >>> factor. I don't suppose there were any cores on that box from the
> >>> segfault?
> >>
> >>
> >> Fyi: I've still not managed to trigger a crash unfortunately. The
> failure
> >> seems more frequent under CI though, with 9 failures in the 17 builds
> since
> >> the first occurrence of this error.
> >>
> >> The last commit for "QPID-3883: Using application headers in messages
> >> causes a very large slowdown" is the change that coincides with the
> first
> >> failure. Though that commit is unlikely to be the cause (it just exports
> >> some symbols correctly for windows), the preceding changes for the same
> JIRA
> >> could be related[1].
> >>
> >> They seem the only likely candidates on the broker side. However its
> >> always possible that a change on the java client side has uncovered an
> >> existing issue.
> >>
> >> [1] E.g. the following two commits which passed once before the first
> >> failure:
> >>
> >> http://svn.apache.org/viewvc/?view=rev&rev=1297292
> >> http://svn.apache.org/viewvc/?view=rev&rev=1297290
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> >> For additional commands, e-mail: dev-help@qpid.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
>
>

Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Robbie Gemmell <ro...@gmail.com>.
I had a look but couldnt see anything, though I may jsut not know
where to look (I presumed CWD which should be the main java dir).

We dont have shell access to the boxes, only the ability to configure
jobs via Jenkins (which you can then shell script to do most things
you want). Keith, Rajith, Andrew K and myself are the only ones I know
have requested such access (via Carl).

Anyone can browse the workspace used by the tests through the Jenkins
web ui, it can be found at:
https://builds.apache.org/view/M-R/view/Qpid/job/Qpid-Java-Cpp-Test/ws/

The test run is tied to the various Ubuntu nodes, yes.

Robbie

On 12 March 2012 13:52, Rajith Attapattu <ra...@gmail.com> wrote:
> Robbie,
>
> Would you happen to know if we can have a quick look in our dir space to see
> if there are any core files.
> Do you have access? could either myself or Gordon get the user/pass to have
> a look ?
>
> Regards,
>
> Rajith
>
> P.S These are machines run ubuntu ?
>
>
> On Mon, Mar 12, 2012 at 9:34 AM, Gordon Sim <gs...@redhat.com> wrote:
>>
>> On 03/12/2012 09:46 AM, Gordon Sim wrote:
>>>
>>> On 03/11/2012 06:10 PM, Robbie Gemmell wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Just thought I'd post this here to make sure it gets seen, I know many
>>>> people filter the JIRA traffic.
>>>>
>>>> I was taking a quick look at why there has been an notable increase in
>>>> failures of the test job Keith had set up to run against the C++
>>>> broker on the ASF Jenkins instances. It seems one test in particular
>>>> (which I added earlier last year to verify a defect fix for the Java
>>>> broker) has been sporadically failing over the last week or so.
>>>> Investigating the logs for the last failure suggest the C++ broker
>>>> segfaulted during the test run, which lead on to the eventual test
>>>> failure report.
>>>>
>>>> I have raised the following JIRA and attached the latest test log to
>>>> it as they dont get kept long. I have marked it as criticial for now,
>>>> although as it appears to affect the 0.16 branch too I'd actually
>>>> probably consider it a blocker at this point (but since theres no
>>>> chance of me being able to fix it, I figured I would let those with a
>>>> clue about the C++ broker make that decision).
>>>>
>>>> https://issues.apache.org/jira/browse/QPID-3893
>>>> C++ broker appears to segfault during
>>>> MultipleTransactedBatchProducerTest
>>>
>>>
>>> I wasn't able to trigger a failure locally; will look for a box with
>>> more processing available as I suspect the level of concurrency may be a
>>> factor. I don't suppose there were any cores on that box from the
>>> segfault?
>>
>>
>> Fyi: I've still not managed to trigger a crash unfortunately. The failure
>> seems more frequent under CI though, with 9 failures in the 17 builds since
>> the first occurrence of this error.
>>
>> The last commit for "QPID-3883: Using application headers in messages
>> causes a very large slowdown" is the change that coincides with the first
>> failure. Though that commit is unlikely to be the cause (it just exports
>> some symbols correctly for windows), the preceding changes for the same JIRA
>> could be related[1].
>>
>> They seem the only likely candidates on the broker side. However its
>> always possible that a change on the java client side has uncovered an
>> existing issue.
>>
>> [1] E.g. the following two commits which passed once before the first
>> failure:
>>
>> http://svn.apache.org/viewvc/?view=rev&rev=1297292
>> http://svn.apache.org/viewvc/?view=rev&rev=1297290
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: dev-help@qpid.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Rajith Attapattu <ra...@gmail.com>.
Robbie,

Would you happen to know if we can have a quick look in our dir space to
see if there are any core files.
Do you have access? could either myself or Gordon get the user/pass to have
a look ?

Regards,

Rajith

P.S These are machines run ubuntu ?

On Mon, Mar 12, 2012 at 9:34 AM, Gordon Sim <gs...@redhat.com> wrote:

> On 03/12/2012 09:46 AM, Gordon Sim wrote:
>
>> On 03/11/2012 06:10 PM, Robbie Gemmell wrote:
>>
>>> Hi all,
>>>
>>> Just thought I'd post this here to make sure it gets seen, I know many
>>> people filter the JIRA traffic.
>>>
>>> I was taking a quick look at why there has been an notable increase in
>>> failures of the test job Keith had set up to run against the C++
>>> broker on the ASF Jenkins instances. It seems one test in particular
>>> (which I added earlier last year to verify a defect fix for the Java
>>> broker) has been sporadically failing over the last week or so.
>>> Investigating the logs for the last failure suggest the C++ broker
>>> segfaulted during the test run, which lead on to the eventual test
>>> failure report.
>>>
>>> I have raised the following JIRA and attached the latest test log to
>>> it as they dont get kept long. I have marked it as criticial for now,
>>> although as it appears to affect the 0.16 branch too I'd actually
>>> probably consider it a blocker at this point (but since theres no
>>> chance of me being able to fix it, I figured I would let those with a
>>> clue about the C++ broker make that decision).
>>>
>>> https://issues.apache.org/**jira/browse/QPID-3893<https://issues.apache.org/jira/browse/QPID-3893>
>>> C++ broker appears to segfault during MultipleTransactedBatchProduce**
>>> rTest
>>>
>>
>> I wasn't able to trigger a failure locally; will look for a box with
>> more processing available as I suspect the level of concurrency may be a
>> factor. I don't suppose there were any cores on that box from the
>> segfault?
>>
>
> Fyi: I've still not managed to trigger a crash unfortunately. The failure
> seems more frequent under CI though, with 9 failures in the 17 builds since
> the first occurrence of this error.
>
> The last commit for "QPID-3883: Using application headers in messages
> causes a very large slowdown" is the change that coincides with the first
> failure. Though that commit is unlikely to be the cause (it just exports
> some symbols correctly for windows), the preceding changes for the same
> JIRA could be related[1].
>
> They seem the only likely candidates on the broker side. However its
> always possible that a change on the java client side has uncovered an
> existing issue.
>
> [1] E.g. the following two commits which passed once before the first
> failure:
>
> http://svn.apache.org/viewvc/?**view=rev&rev=1297292<http://svn.apache.org/viewvc/?view=rev&rev=1297292>
> http://svn.apache.org/viewvc/?**view=rev&rev=1297290<http://svn.apache.org/viewvc/?view=rev&rev=1297290>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.**org<de...@qpid.apache.org>
> For additional commands, e-mail: dev-help@qpid.apache.org
>
>

Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Robbie Gemmell <ro...@gmail.com>.
Ok great. Keith also just reproduced it locally on RHEL 5.3 (failed 2
out of 2 runs) and has a coredump if you need it.

Robbie

On 12 March 2012 14:46, Gordon Sim <gs...@redhat.com> wrote:
> On 03/12/2012 01:34 PM, Gordon Sim wrote:
>>
>> Fyi: I've still not managed to trigger a crash unfortunately. The
>> failure seems more frequent under CI though, with 9 failures in the 17
>> builds since the first occurrence of this error.
>
>
> Managed to reproduce (enabling full logging on the broker helps)... I've
> added the backtrace to the JIRA and it does on the surface look like it may
> be FieldTable changes related.
>
>
>> The last commit for "QPID-3883: Using application headers in messages
>> causes a very large slowdown" is the change that coincides with the
>> first failure. Though that commit is unlikely to be the cause (it just
>> exports some symbols correctly for windows), the preceding changes for
>> the same JIRA could be related[1].
>>
>> They seem the only likely candidates on the broker side. However its
>> always possible that a change on the java client side has uncovered an
>> existing issue.
>>
>> [1] E.g. the following two commits which passed once before the first
>> failure:
>>
>> http://svn.apache.org/viewvc/?view=rev&rev=1297292
>> http://svn.apache.org/viewvc/?view=rev&rev=1297290
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: dev-help@qpid.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Gordon Sim <gs...@redhat.com>.
On 03/12/2012 01:34 PM, Gordon Sim wrote:
> Fyi: I've still not managed to trigger a crash unfortunately. The
> failure seems more frequent under CI though, with 9 failures in the 17
> builds since the first occurrence of this error.

Managed to reproduce (enabling full logging on the broker helps)... I've 
added the backtrace to the JIRA and it does on the surface look like it 
may be FieldTable changes related.

> The last commit for "QPID-3883: Using application headers in messages
> causes a very large slowdown" is the change that coincides with the
> first failure. Though that commit is unlikely to be the cause (it just
> exports some symbols correctly for windows), the preceding changes for
> the same JIRA could be related[1].
>
> They seem the only likely candidates on the broker side. However its
> always possible that a change on the java client side has uncovered an
> existing issue.
>
> [1] E.g. the following two commits which passed once before the first
> failure:
>
> http://svn.apache.org/viewvc/?view=rev&rev=1297292
> http://svn.apache.org/viewvc/?view=rev&rev=1297290
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Gordon Sim <gs...@redhat.com>.
On 03/12/2012 09:46 AM, Gordon Sim wrote:
> On 03/11/2012 06:10 PM, Robbie Gemmell wrote:
>> Hi all,
>>
>> Just thought I'd post this here to make sure it gets seen, I know many
>> people filter the JIRA traffic.
>>
>> I was taking a quick look at why there has been an notable increase in
>> failures of the test job Keith had set up to run against the C++
>> broker on the ASF Jenkins instances. It seems one test in particular
>> (which I added earlier last year to verify a defect fix for the Java
>> broker) has been sporadically failing over the last week or so.
>> Investigating the logs for the last failure suggest the C++ broker
>> segfaulted during the test run, which lead on to the eventual test
>> failure report.
>>
>> I have raised the following JIRA and attached the latest test log to
>> it as they dont get kept long. I have marked it as criticial for now,
>> although as it appears to affect the 0.16 branch too I'd actually
>> probably consider it a blocker at this point (but since theres no
>> chance of me being able to fix it, I figured I would let those with a
>> clue about the C++ broker make that decision).
>>
>> https://issues.apache.org/jira/browse/QPID-3893
>> C++ broker appears to segfault during MultipleTransactedBatchProducerTest
>
> I wasn't able to trigger a failure locally; will look for a box with
> more processing available as I suspect the level of concurrency may be a
> factor. I don't suppose there were any cores on that box from the segfault?

Fyi: I've still not managed to trigger a crash unfortunately. The 
failure seems more frequent under CI though, with 9 failures in the 17 
builds since the first occurrence of this error.

The last commit for "QPID-3883: Using application headers in messages 
causes a very large slowdown" is the change that coincides with the 
first failure. Though that commit is unlikely to be the cause (it just 
exports some symbols correctly for windows), the preceding changes for 
the same JIRA could be related[1].

They seem the only likely candidates on the broker side. However its 
always possible that a change on the java client side has uncovered an 
existing issue.

[1] E.g. the following two commits which passed once before the first 
failure:

http://svn.apache.org/viewvc/?view=rev&rev=1297292
http://svn.apache.org/viewvc/?view=rev&rev=1297290

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Re: C++ broker appears to segfault during MultipleTransactedBatchProducerTest

Posted by Gordon Sim <gs...@redhat.com>.
On 03/11/2012 06:10 PM, Robbie Gemmell wrote:
> Hi all,
>
> Just thought I'd post this here to make sure it gets seen, I know many
> people filter the JIRA traffic.
>
> I was taking a quick look at why there has been an notable increase in
> failures of the test job Keith had set up to run against the C++
> broker on the ASF Jenkins instances. It seems one test in particular
> (which I added earlier last year to verify a defect fix for the Java
> broker) has been sporadically failing over the last week or so.
> Investigating the logs for the last failure suggest the C++ broker
> segfaulted during the test run, which lead on to the eventual test
> failure report.
>
> I have raised the following JIRA and attached the latest test log to
> it as they dont get kept long. I have marked it as criticial for now,
> although as it appears to affect the 0.16 branch too I'd actually
> probably consider it a blocker at this point (but since theres no
> chance of me being able to fix it, I figured I would let those with a
> clue about the C++ broker make that decision).
>
> https://issues.apache.org/jira/browse/QPID-3893
> C++ broker appears to segfault during MultipleTransactedBatchProducerTest

I wasn't able to trigger a failure locally; will look for a box with 
more processing available as I suspect the level of concurrency may be a 
factor. I don't suppose there were any cores on that box from the segfault?

I agree with your assessment as critical until we uncover what the issue 
is, whether its a regression, what the scope of the fix might be etc etc.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org