You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Alice Chen (JIRA)" <ji...@apache.org> on 2015/07/22 20:19:18 UTC
[jira] [Created] (TRAFODION-978) LP Bug: 1417748 - Memory overwrite
causes core in regr test seabase TEST024 and TEST025
Alice Chen created TRAFODION-978:
------------------------------------
Summary: LP Bug: 1417748 - Memory overwrite causes core in regr test seabase TEST024 and TEST025
Key: TRAFODION-978
URL: https://issues.apache.org/jira/browse/TRAFODION-978
Project: Apache Trafodion
Issue Type: Bug
Components: sql-exe
Reporter: Sandhya Sundaresan
Assignee: Apache Trafodion
Priority: Critical
Fix For: 1.1 (pre-incubation)
Drop table runs into this issue and detects a memory corruption int he shared segment. It is unclear who or which operation is doing the overwrite.
One sequence that seems to run into it sometimes is running seabase test suite - both 024 and 025 in sequence.
A narrower test case has not been found.
Bug catchers have been added in the code to detect corruption.
From: Govindarajan, Selvaganes
Sent: Friday, January 30, 2015 7:48 AM
To: Varnau, Steve (Trafodion); Hanlon, Mike; Sharma, Anoop
Cc: Varshneya, Renu; Sundaresan, Sandhya; Birdsall, Dave; Du, Justin
Subject: RE: RMS memory corruption
This issue still remains a mystery. I have enabled buffer overflow detection and ran the full seabase test suite 10 times overnight and didn’t hit the issue. I also ran without the buffer overflow code after my recent changes to defend RMS corruption and didn’t see the problem. Hence I have enabled seabase/TEST024 also.
With the current change, when the RMS memory shared segment is accessed in an illegal way, it will bring down the node.
Selva
From: Govindarajan, Selvaganes
Sent: Monday, January 26, 2015 3:15 PM
To: Varnau, Steve (Trafodion); Hanlon, Mike; Sharma, Anoop
Cc: Varshneya, Renu; Sundaresan, Sandhya
Subject: RE: RMS memory corruption
This issue has been elusive. I have been running with bug catcher code to get a dump of master/esps/arkcmps if these processes violate the shared segment requirements. But, I have always getting mxsscp dump instead in the 2nd run of whole seabase regression suite(including TEST025).
>From the core, I suspected seabase/TEST024 might have caused the corruption. Hence I disabled TEST024. Now I was able to run the full seabase test suite 3 times without any corruption.
So, I might plan to submit this change If some more experiments points to TEST024. These bug catchers are light weight and hence it can be always on forever to avoid memory corruption in RMS.
Selva
Assigned to LaunchPad User Mike Hanlon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)