You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:19:18 UTC

[jira] [Created] (TRAFODION-978) LP Bug: 1417748 - Memory overwrite causes core in regr test seabase TEST024 and TEST025

Alice Chen created TRAFODION-978:
------------------------------------

             Summary: LP Bug: 1417748 - Memory overwrite causes core in  regr test seabase TEST024 and TEST025
                 Key: TRAFODION-978
                 URL: https://issues.apache.org/jira/browse/TRAFODION-978
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-exe
            Reporter: Sandhya Sundaresan
            Assignee: Apache Trafodion
            Priority: Critical
             Fix For: 1.1 (pre-incubation)


Drop table runs into this issue and detects a memory corruption int he shared segment. It is unclear who or which operation is doing the overwrite. 
One sequence that seems to run into it sometimes is running seabase test suite - both 024 and 025 in sequence.
A narrower test case has not been found. 
Bug catchers have been added in the code to detect corruption.

From: Govindarajan, Selvaganes 
Sent: Friday, January 30, 2015 7:48 AM
To: Varnau, Steve (Trafodion); Hanlon, Mike; Sharma, Anoop
Cc: Varshneya, Renu; Sundaresan, Sandhya; Birdsall, Dave; Du, Justin
Subject: RE: RMS memory corruption

This issue still remains a mystery. I have enabled buffer overflow detection and ran the full seabase test suite 10 times overnight and didn’t hit the issue.  I also ran without the buffer overflow code after my recent changes to defend RMS corruption and didn’t see the problem. Hence I have enabled seabase/TEST024 also.

With the current change, when the RMS memory shared segment is accessed in an illegal way, it will bring down the node.

Selva

From: Govindarajan, Selvaganes 
Sent: Monday, January 26, 2015 3:15 PM
To: Varnau, Steve (Trafodion); Hanlon, Mike; Sharma, Anoop
Cc: Varshneya, Renu; Sundaresan, Sandhya
Subject: RE: RMS memory corruption

This issue has been elusive. I have been running with bug catcher code to get a dump of master/esps/arkcmps if these processes violate the shared segment requirements. But, I have always getting mxsscp dump instead in the 2nd run of whole seabase regression suite(including TEST025). 

>From the core, I suspected seabase/TEST024 might have caused the corruption. Hence I disabled TEST024. Now I was able to run the full seabase test suite 3 times without any corruption. 

So, I might plan to submit this change If some more experiments points to TEST024. These bug catchers are light weight and hence it can be always on forever to avoid memory corruption in RMS.

Selva
Assigned to LaunchPad User Mike Hanlon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)