You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2008/02/06 12:25:08 UTC

[jira] Created: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Database corruption when adding sleep() in RAFContainer4.writePage()
--------------------------------------------------------------------

                 Key: DERBY-3393
                 URL: https://issues.apache.org/jira/browse/DERBY-3393
             Project: Derby
          Issue Type: Bug
          Components: Store
    Affects Versions: 10.4.0.0
         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
            Reporter: Knut Anders Hatlen
            Priority: Critical


In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:

Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
===================================================================
--- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
+++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
@@ -350,6 +350,11 @@
                     dataFactory.writeFinished();
                 }
             } else {
+                try {
+                    Thread.sleep(100);
+                } catch (InterruptedException ie) {
+                    // ignored
+                }
                 synchronized(this) {
                     needsSync = true;
                 }

When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.

The most common failure was the following (page numbers and container numbers varied):

> Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> 	... 17 more
> Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> 	... 14 more
> Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> 	... 14 more

This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.

In some cases, I saw this failure in oc_rec3

> Exception while trying to insert row number: 0
> ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.

which would be followed by this error in oc_rec4:

> ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali resolved DERBY-3393.
-----------------------------------

    Resolution: Duplicate

I ran the store recovery 10 times and it failed somewhere in the recovery suite every time before the fix to 
DERBY-4239.  I ran the store recovery suite 200 times after the fix and did not get an error.  Marking this as 
a duplicate of  DERBY-4239.

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.1.3
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Assignee: Mike Matrigali
>            Priority: Critical
>         Attachments: derby3393.diff
>
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3393:
----------------------------------


I am seeing similar failures running on an XP laptop, ibm15 jvm, with write cache enabled (yes this is evil but don't think it matters if machine does not crash).

The following error has the feel of a checkpoint deleting a log file before it should have, or the data not getting to disk when the 
checkpoint thought it had.  

Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.

I am going to run some tests that leave the log file around and see if the checkpoint record shows anything.



> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.0.0
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Priority: Critical
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3393:
----------------------------------

    Derby Categories: [Data corruption, High Value Fix]  (was: [High Value Fix])

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.1.3
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Priority: Critical
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3393:
----------------------------------

    Attachment: derby3393.diff

a patch file for adding the sleep as suggested to write.  This is only for testing purposes and should not be committed to codeline.

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.1.3
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Assignee: Mike Matrigali
>            Priority: Critical
>         Attachments: derby3393.diff
>
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen closed DERBY-3393.
-------------------------------------


> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.1.3
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Assignee: Mike Matrigali
>            Priority: Critical
>         Attachments: derby3393.diff
>
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-3393:
----------------------------------

    Derby Categories: [High Value Fix]

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.1.3
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Priority: Critical
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3393:
--------------------------------------

    Affects Version/s: 10.2.2.1
                       10.3.1.4

I see the same errors when I run the experiment on 10.3.1.4. On 10.2, we don't have the RAFContainer4 class, but when I put a sleep() into the corresponding location in RAFContainer.writePage(), I see the same errors there as well.

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.0.0
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Priority: Critical
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali reassigned DERBY-3393:
-------------------------------------

    Assignee: Mike Matrigali

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.1.3
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Assignee: Mike Matrigali
>            Priority: Critical
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3393) Database corruption when adding sleep() in RAFContainer4.writePage()

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-3393:
----------------------------------


do note that with the storerecovery test, subsequent tests rely on previous tests.  Basically they are ordered and each stage is
expected to do some work and then usually crash the database.  Then next test connects to same database, performs recovery,
checks that things look right and then maybe do more and crash again, ....  If one test gets a recovery error it is probably not useful
to look at subsequent errors.  The .sum file lists the tests in order of run, the report.txt file looks like it reorders them so it is 
confusing.  

> Database corruption when adding sleep() in RAFContainer4.writePage()
> --------------------------------------------------------------------
>
>                 Key: DERBY-3393
>                 URL: https://issues.apache.org/jira/browse/DERBY-3393
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.1, 10.3.1.4, 10.4.0.0
>         Environment: Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
>            Reporter: Knut Anders Hatlen
>            Priority: Critical
>
> In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:
> Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
> +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
> @@ -350,6 +350,11 @@
>                      dataFactory.writeFinished();
>                  }
>              } else {
> +                try {
> +                    Thread.sleep(100);
> +                } catch (InterruptedException ie) {
> +                    // ignored
> +                }
>                  synchronized(this) {
>                      needsSync = true;
>                  }
> When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.
> The most common failure was the following (page numbers and container numbers varied):
> > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
> > 	... 17 more
> > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
> > 	... 14 more
> This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.
> In some cases, I saw this failure in oc_rec3
> > Exception while trying to insert row number: 0
> > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.
> which would be followed by this error in oc_rec4:
> > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.