You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by SB...@ILSTechnology.com on 2005/09/07 16:50:41 UTC

'Invalid checksum on Page' error

Hi,
        I have apache derby 10.0 running on a MontaVista Linux system (3.1
Professional with Linux/i686 2.4.20)  in embedded mode using the
EmbeddedConnectionPoolDataSource.
The java level is Sun's jre 1.4.2_04.
There are around 3.5 million records in a table in the DB.
While adding the records I had  one thread inserting rows into this table
at a rate of around 50 msecs.
Another thread is periodically doing selects on this table and some
deletes.

When the record count was building up to 3.5 million , no deletes were
being done on the table.
I have the transaction log and db temp space in a different directory.

When a thread attempts to delete a record from the table , it catches a
SQLException with the following error message

 SQLError:0              SQLState:XJ001           SQLErrMsg:Java exception:
': java.lang.NullPointerException'.


The derby.log file (at the end of this posting )  indicates an invalid
checksum on a page . I have only included the first few lines.
This may have occurred when i was selecting data from the database.

If I restart the application, I sometimes get the same SQLException on the
thread that is inserting data , after a few succesful inserts.

When I run the command line client (ij), I am able to select and delete
records from this database.

What would typically cause a checksum error to occur ? Is there a way to
recover from it without losing data ?


====================  Begin derby.log
========================================================================
------------  BEGIN SHUTDOWN ERROR STACK -------------

ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
expected=3,
558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............


The trailing stack trace is as follows:

        at
org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at
org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
Source)
        at
org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
        at
org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
        at
org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
Source)
        at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
Source)
        at org.apache.derby.impl.services.cache.Clock.find(Unknown Source)
        at
org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
Source)
        at
org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown Source)
        at
org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
Source)
        at
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
 Source)
        at
org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
 Source)
        at
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
 Source)
        at
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
 Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
Source)
        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
       ......

------------  END SHUTDOWN ERROR STACK -------------

2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
action starting
2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
Statement is: INSERT
INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
java.lang.NullPointerException
        at
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
 Source)
        at
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
 Source)
        at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
Source)
        at
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown
 Source)
        at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
Source)
        at
org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
Source)
        at
org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown
 Source)
        at
org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
Source)
        at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
Source)
        at
org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
Source)
        at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
Source)
        at
org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
        at
org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
        at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
Source)
        at
org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
Source)
        at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
........
Cleanup action completed

==================== derby.log
========================================================================


Thanks in advance.
Sunil.








Re: 'Invalid checksum on Page' error

Posted by Suresh Thalamati <su...@gmail.com>.
Mike,

I also looked at the page dump, could not find any clues on what is 
possibly making the checksum failure occurs on this page.

1) Recalculated the checksum on the page, same mismatch.
2) Last slot seems to be referring to valid portion of the page.
Last Slot on the page:
slot 59 offset 25845  recordlen 438 (438,0)recordHeader: Id=65
   isDeleted     = false
   hasOverflow   = false
   hasFirstField = false
   numberFields  = 3
   firstField    = 0
   overflowPage  = 0
   overflowId    = 0
	Field 0: offset=25845 len=4 Nullable
	Field 1: offset=25855 len=12 Nullable
	Field 2: offset=25869 len=411 Nullable

Hex dump on the page related to this slot:

00007e80: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00007e90: 64f5 01b6 0000 633f 01b6 0000 618a 01b5  d..?..c..?..a?.?

64f5 = 25845 ; there does not be any entries on the slot table after 
that one.

3) No of slots on the slot table by looking at the hex dump and the no 
  of slots in the header is matching.

Page Header :
page id null Overflow: false PageVersion: 61 SlotsInUse: 60 
DeletedRowCount: 0 PageStatus: 1 NextId: 66 firstFreeByte: 26283 
freeSpace: 6117 totalSpace: 32700 spareSpace: 0 PageSize: 32768

Attaching the complete slot table info I printed from the hex dump.


Thanks
-suresht

Mike Matrigali wrote:
> suresh or anyone else, could you take a look at the page dump and
> see if I am missing anything?
> 
> Mike Matrigali wrote:
> 
> 
>>thanks for the more info.  Definitely interested if you can reproduce
>>on different device.  I did a quick look at the page dump and on the
>>surface nothing jumped out, the ascii dump of the data looks reasonable,
>>there is a set of 0's in the middle as expected with a set of what looks
>>like a reasonable page offset table at the end, the last page offset
>>points at what looks like the last record.  Next step is to decode
>>the actual values in stuff like the page hdrs, see if the zero's in the
>>middle are right or if there is missing data pointed to by the offset
>>table.
>>
>>
>>Some more questions:
>>o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
>>o were you setting any non-default derby properties?
>>o was this database encrypted?
>>o When you were loading the db was there any crash encountered?
>>
>>When you try to reproduce could you set the following property so that
>>derby.log will have a complete record of any errors, by default it gets
>>overwritten every time:
>>http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html
>>
>>If the data in your db is not sensitive would you be willing to provide
>>it.  I realize it is probably very big, so I am not sure the best way.
>>Derby db's do tend to compress well using standard zip.
>>
>>SBarboza@ILSTechnology.com wrote:
>>
>>
>>
>>>The error is always on the same page ( 10031 ).
>>>I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
>>>about the page checksum error
>>>that is listed in the derby.log.
>>>I took a look at the OS logs but there was nothing that would indicate a IO
>>>failure.
>>>I am attaching the derby.log file.
>>>
>>>(See attached file: derby.log)
>>>

<snip ..>

Re: 'Invalid checksum on Page' error

Posted by Mike Matrigali <mi...@sbcglobal.net>.
suresh or anyone else, could you take a look at the page dump and
see if I am missing anything?

Mike Matrigali wrote:

> thanks for the more info.  Definitely interested if you can reproduce
> on different device.  I did a quick look at the page dump and on the
> surface nothing jumped out, the ascii dump of the data looks reasonable,
> there is a set of 0's in the middle as expected with a set of what looks
> like a reasonable page offset table at the end, the last page offset
> points at what looks like the last record.  Next step is to decode
> the actual values in stuff like the page hdrs, see if the zero's in the
> middle are right or if there is missing data pointed to by the offset
> table.
> 
> 
> Some more questions:
> o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
> o were you setting any non-default derby properties?
> o was this database encrypted?
> o When you were loading the db was there any crash encountered?
> 
> When you try to reproduce could you set the following property so that
> derby.log will have a complete record of any errors, by default it gets
> overwritten every time:
> http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html
> 
> If the data in your db is not sensitive would you be willing to provide
> it.  I realize it is probably very big, so I am not sure the best way.
> Derby db's do tend to compress well using standard zip.
> 
> SBarboza@ILSTechnology.com wrote:
> 
> 
>>The error is always on the same page ( 10031 ).
>>I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
>>about the page checksum error
>> that is listed in the derby.log.
>>I took a look at the OS logs but there was nothing that would indicate a IO
>>failure.
>>I am attaching the derby.log file.
>>
>>(See attached file: derby.log)
>>
>>I will run this scenario on several devices to try to recreate the problem.
>>
>>
>>
>>
>>|---------+---------------------------->
>>|         |           Mike Matrigali   |
>>|         |           <mikem_app@sbcglo|
>>|         |           bal.net>         |
>>|         |                            |
>>|         |           09/07/2005 12:47 |
>>|         |           PM               |
>>|         |           Please respond to|
>>|         |           "Derby           |
>>|         |           Discussion"      |
>>|---------+---------------------------->
>>  >---------------------------------------------------------------------------------------------------------------------------------------------|
>>  |                                                                                                                                             |
>>  |       To:       Derby Discussion <de...@db.apache.org>                                                                                 |
>>  |       cc:                                                                                                                                   |
>>  |       Subject:  Re: 'Invalid checksum on Page' error                                                                                        |
>>  >---------------------------------------------------------------------------------------------------------------------------------------------|
>>
>>
>>
>>
>>The most usual case that causes a bad checksum error is a
>>hardware problem on the data disk.  I have also seen OS I/O issues
>>where for some reason other data has been written into the derby
>>file.  Have you checked the OS log
>>to see if any errors are being generated?  Could you attach the
>>complete derby.log if it is not too big? Or if not could you at
>>least attach the complete error from this particular error - most
>>of the time the page dump won't help much but sometimes it is
>>interesting if there is something like all 0's in the end of
>>the page.
>>
>>It sounds like this problem on the disk and not a runtime error
>>from your description.  The current error is reporting an error
>>on page 10031, are all the errors you are seeing on the same page?
>>Running the following will check your table, and should report
>>the same error as encountered below if the problem is a persistent
>>on disk error:
>>http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
>>http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html
>>
>>The only supported way to recover from this error to apply a backup
>>if you have one, and if it was a roll forward backup it will bring
>>the database up to the current state.
>>
>>SBarboza@ILSTechnology.com wrote:
>>
>>
>>>Hi,
>>>       I have apache derby 10.0 running on a MontaVista Linux system
>>
>>(3.1
>>
>>
>>>Professional with Linux/i686 2.4.20)  in embedded mode using the
>>>EmbeddedConnectionPoolDataSource.
>>>The java level is Sun's jre 1.4.2_04.
>>>There are around 3.5 million records in a table in the DB.
>>>While adding the records I had  one thread inserting rows into this table
>>>at a rate of around 50 msecs.
>>>Another thread is periodically doing selects on this table and some
>>>deletes.
>>>
>>>When the record count was building up to 3.5 million , no deletes were
>>>being done on the table.
>>>I have the transaction log and db temp space in a different directory.
>>>
>>>When a thread attempts to delete a record from the table , it catches a
>>>SQLException with the following error message
>>>
>>>SQLError:0              SQLState:XJ001           SQLErrMsg:Java
>>
>>exception:
>>
>>
>>>': java.lang.NullPointerException'.
>>>
>>>
>>>The derby.log file (at the end of this posting )  indicates an invalid
>>>checksum on a page . I have only included the first few lines.
>>>This may have occurred when i was selecting data from the database.
>>>
>>>If I restart the application, I sometimes get the same SQLException on
>>
>>the
>>
>>
>>>thread that is inserting data , after a few succesful inserts.
>>>
>>>When I run the command line client (ij), I am able to select and delete
>>>records from this database.
>>>
>>>What would typically cause a checksum error to occur ? Is there a way to
>>>recover from it without losing data ?
>>>
>>>
>>>====================  Begin derby.log
>>>========================================================================
>>>------------  BEGIN SHUTDOWN ERROR STACK -------------
>>>
>>>ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
>>>expected=3,
>>>558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
>>>00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
>>>00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>>>
>>>
>>>The trailing stack trace is as follows:
>>>
>>>       at
>>>org.apache.derby.iapi.error.StandardException.newException(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.find(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
>>
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
>>
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
>>
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
>>
>>
>>
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>>>      ......
>>>
>>>------------  END SHUTDOWN ERROR STACK -------------
>>>
>>>2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
>>>action starting
>>>2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
>>>Statement is: INSERT
>>>INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
>>>java.lang.NullPointerException
>>>       at
>>>
>>
>>org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>>
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>>
>>
>>
>>>Source)
>>>       at
>>
>>org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown
>>
>>
>>
>>>Source)
>>>       at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown
>>
>>
>>
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
>>
>>
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>
>>
>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
>>
>>
>>>Source)
>>>       at
>>>
>>
>>org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
>>
>>
>>>Source)
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
>>>........
>>>Cleanup action completed
>>>
>>>==================== derby.log
>>>========================================================================
>>>
>>>
>>>Thanks in advance.
>>>Sunil.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
> 

Re: 'Invalid checksum on Page' error

Posted by SB...@ILSTechnology.com.
Really appreciate the response.
Here are some answers:

The compact flash drives are  SILICON DRIVE CF - 2G (SSD-C02GI-3012)
manufactured by Silicon Systems Inc.
SSI claims over 2,000,000 write/erase cycles. They have indicated that
their  wear-leveling algorithm uses the entire
media, not just the unallocated blocks and so it would take years to wear
these out.
I am going to take their word for it , since we haven't had a disk failure
yet in several of our tests.

Here is the crate table statement
CREATE TABLE messages_1 (msg_id integer, msg_timestamp timestamp, msg
BLOB);
create unique index MI_messages_1  on messages_1( msg_id);
create index TI_messages_1 on messages_1( msg_timestamp );

The db is about 1.19 G.
The compressed db size is about 76M.
I can post this as a JIRA entry if it is within acceptable size limits.
Should I just create a JIRA issue and append the DB to it ?

I am puzzled that the pageSize directive didn't take effect. I am sure the
properties file was found because the
database temp space  which was specified to be in an alternate location was
created at the same time the DB was.




|---------+---------------------------->
|         |           Mike Matrigali   |
|         |           <mikem_app@sbcglo|
|         |           bal.net>         |
|         |                            |
|         |           09/13/2005 02:11 |
|         |           PM               |
|         |           Please respond to|
|         |           "Derby           |
|         |           Discussion"      |
|---------+---------------------------->
  >---------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                       |
  |       To:       Derby Discussion <de...@db.apache.org>                                                                           |
  |       cc:                                                                                                                             |
  |       Subject:  Re: 'Invalid checksum on Page' error                                                                                  |
  >---------------------------------------------------------------------------------------------------------------------------------------|




sorry for the delay, went away for the weekend.

I am not sure what made me ask, but I am glad I did - there
is VERY little testing of derby on compact flash devices.  I
am not sure how appropriate they are for serious write
applications - any opinions out there?  The specs on these
things are all over the map, can you post the exact model
you are using?

You didn't say how big the db was, could you post that.  I think
if it is withing the JIRA limitations the best place to post the
db would be to file a JIRA entry and attach the db to it.  That way
anyone in derby can see it.

I don't have much experience with flash cards and derby, and it
is of course very hardware dependent.   Anyone who understands the
hardware better please correct me.   I found the following on
a sandisk compact flash and I don't understand most of it
(http://www.sandisk.com/pdf/oem/WPaperWearLevelv1.0.pdf) - but
what I do get is that they seem to be using 2,000,000 updates as
an expected failure boundary (and I think that is for a very
good compact flash - I think I have seen others that only
guarantee 10,000 writes).  So given that you are doing 3.5 million
inserts it may be that the hardware just is not going to be
able to support your application.

Now the actual number of I/O's to the disk is dependent on your
application: (size of buffer cache, size of rows, number of
checkpoints).

Any compact flash experts out there, is it possible (maybe even
likely), that after some minimum number of writes, a write to the
card could mess up a bit but not log any error on the write or
on the subsequent read.

Would you be able to recognize a bad bit in the data returned from
this page?  For instance on glancing at the page dump it looked like
the test char data was the same for every row.  If we get the
database we can probably hack a version that ignores the checksum
and returns the data, if the metadata on the page is all good then
it would be interesting to know if the data is all good.

Also could you post the create statement you used for the table.

I am not sure why but your property did not affect the table, as
the table is defaulting to 32k.  Some reasons could be that property
was not there when you created the table, or some error in the property
file or the property file not being found.


SBarboza@ILSTechnology.com wrote:

> The device is using a Compact Flash card as its persistent storage.
> The derby.properties file entries are as follows:
>
> derby.storage.pageSize=8192
> derby.storage.tempDirectory=/root/xqjava/dbtmp
>
>
> The database is not encrypted.
> When loading the DB  there were no crashes.
>
> I think we can provide you with the database image. Where should we send
it
> to ?
>  I'd hate to clog up the discussion group with the image.
>
> Thanks
> Sunil
>
>
>
> |---------+---------------------------->
> |         |           Mike Matrigali   |
> |         |           <mikem_app@sbcglo|
> |         |           bal.net>         |
> |         |                            |
> |         |           09/08/2005 01:15 |
> |         |           PM               |
> |         |           Please respond to|
> |         |           "Derby           |
> |         |           Discussion"      |
> |---------+---------------------------->
>   >
---------------------------------------------------------------------------------------------------------------------------------------------|

>   |
|
>   |       To:       Derby Discussion <de...@db.apache.org>
|
>   |       cc:
|
>   |       Subject:  Re: 'Invalid checksum on Page' error
|
>   >
---------------------------------------------------------------------------------------------------------------------------------------------|

>
>
>
>
> thanks for the more info.  Definitely interested if you can reproduce
> on different device.  I did a quick look at the page dump and on the
> surface nothing jumped out, the ascii dump of the data looks reasonable,
> there is a set of 0's in the middle as expected with a set of what looks
> like a reasonable page offset table at the end, the last page offset
> points at what looks like the last record.  Next step is to decode
> the actual values in stuff like the page hdrs, see if the zero's in the
> middle are right or if there is missing data pointed to by the offset
> table.
>
>
> Some more questions:
> o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
> o were you setting any non-default derby properties?
> o was this database encrypted?
> o When you were loading the db was there any crash encountered?
>
> When you try to reproduce could you set the following property so that
> derby.log will have a complete record of any errors, by default it gets
> overwritten every time:
> http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html
>
> If the data in your db is not sensitive would you be willing to provide
> it.  I realize it is probably very big, so I am not sure the best way.
> Derby db's do tend to compress well using standard zip.
>
> SBarboza@ILSTechnology.com wrote:
>
>
>>The error is always on the same page ( 10031 ).
>>I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
>>about the page checksum error
>> that is listed in the derby.log.
>>I took a look at the OS logs but there was nothing that would indicate a
>
> IO
>
>>failure.
>>I am attaching the derby.log file.
>>
>>(See attached file: derby.log)
>>
>>I will run this scenario on several devices to try to recreate the
>
> problem.
>
>>
>>
>>
>>|---------+---------------------------->
>>|         |           Mike Matrigali   |
>>|         |           <mikem_app@sbcglo|
>>|         |           bal.net>         |
>>|         |                            |
>>|         |           09/07/2005 12:47 |
>>|         |           PM               |
>>|         |           Please respond to|
>>|         |           "Derby           |
>>|         |           Discussion"      |
>>|---------+---------------------------->
>>  >
>
>
---------------------------------------------------------------------------------------------------------------------------------------------|

>
>
>>  |
>
> |
>
>>  |       To:       Derby Discussion <de...@db.apache.org>
>
> |
>
>>  |       cc:
>
> |
>
>>  |       Subject:  Re: 'Invalid checksum on Page' error
>
> |
>
>>  >
>
>
---------------------------------------------------------------------------------------------------------------------------------------------|

>
>
>>
>>
>>
>>The most usual case that causes a bad checksum error is a
>>hardware problem on the data disk.  I have also seen OS I/O issues
>>where for some reason other data has been written into the derby
>>file.  Have you checked the OS log
>>to see if any errors are being generated?  Could you attach the
>>complete derby.log if it is not too big? Or if not could you at
>>least attach the complete error from this particular error - most
>>of the time the page dump won't help much but sometimes it is
>>interesting if there is something like all 0's in the end of
>>the page.
>>
>>It sounds like this problem on the disk and not a runtime error
>>from your description.  The current error is reporting an error
>>on page 10031, are all the errors you are seeing on the same page?
>>Running the following will check your table, and should report
>>the same error as encountered below if the problem is a persistent
>>on disk error:
>>http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
>>http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html
>>
>>The only supported way to recover from this error to apply a backup
>>if you have one, and if it was a roll forward backup it will bring
>>the database up to the current state.
>>
>>SBarboza@ILSTechnology.com wrote:
>>
>>
>>>Hi,
>>>       I have apache derby 10.0 running on a MontaVista Linux system
>>
>>(3.1
>>
>>
>>>Professional with Linux/i686 2.4.20)  in embedded mode using the
>>>EmbeddedConnectionPoolDataSource.
>>>The java level is Sun's jre 1.4.2_04.
>>>There are around 3.5 million records in a table in the DB.
>>>While adding the records I had  one thread inserting rows into this
table
>>>at a rate of around 50 msecs.
>>>Another thread is periodically doing selects on this table and some
>>>deletes.
>>>
>>>When the record count was building up to 3.5 million , no deletes were
>>>being done on the table.
>>>I have the transaction log and db temp space in a different directory.
>>>
>>>When a thread attempts to delete a record from the table , it catches a
>>>SQLException with the following error message
>>>
>>>SQLError:0              SQLState:XJ001           SQLErrMsg:Java
>>
>>exception:
>>
>>
>>>': java.lang.NullPointerException'.
>>>
>>>
>>>The derby.log file (at the end of this posting )  indicates an invalid
>>>checksum on a page . I have only included the first few lines.
>>>This may have occurred when i was selecting data from the database.
>>>
>>>If I restart the application, I sometimes get the same SQLException on
>>
>>the
>>
>>
>>>thread that is inserting data , after a few succesful inserts.
>>>
>>>When I run the command line client (ij), I am able to select and delete
>>>records from this database.
>>>
>>>What would typically cause a checksum error to occur ? Is there a way to
>>>recover from it without losing data ?
>>>
>>>
>>>====================  Begin derby.log
>>>========================================================================
>>>------------  BEGIN SHUTDOWN ERROR STACK -------------
>>>
>>>ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
>>>expected=3,
>>>558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
>>>00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
>>>00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>>>
>>>
>>>The trailing stack trace is as follows:
>>>
>>>       at
>>>org.apache.derby.iapi.error.StandardException.newException(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
>
>>>       at
>>>org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.find(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown

>
>
>>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown

>
>
>>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown

>
>
>>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown

>
>
>>
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>>>      ......
>>>
>>>------------  END SHUTDOWN ERROR STACK -------------
>>>
>>>2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
>>>action starting
>>>2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
>>>Statement is: INSERT
>>>INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
>>>java.lang.NullPointerException
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown

>
>
>>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown

>
>
>>
>>>Source)
>>>       at
>>
>>org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
>>
>>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown

>
>
>>
>>>Source)
>>>       at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown

>
>
>>
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
>
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
>
>>>Source)
>>>       at
>>>
>>
>>
>
org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
>
>>>Source)
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown
Source)
>>>........
>>>Cleanup action completed
>>>
>>>==================== derby.log
>>>========================================================================
>>>
>>>
>>>Thanks in advance.
>>>Sunil.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>
>
>
>






Re: 'Invalid checksum on Page' error

Posted by Mike Matrigali <mi...@sbcglobal.net>.
sorry for the delay, went away for the weekend.

I am not sure what made me ask, but I am glad I did - there
is VERY little testing of derby on compact flash devices.  I
am not sure how appropriate they are for serious write
applications - any opinions out there?  The specs on these
things are all over the map, can you post the exact model
you are using?

You didn't say how big the db was, could you post that.  I think
if it is withing the JIRA limitations the best place to post the
db would be to file a JIRA entry and attach the db to it.  That way
anyone in derby can see it.

I don't have much experience with flash cards and derby, and it
is of course very hardware dependent.   Anyone who understands the
hardware better please correct me.   I found the following on
a sandisk compact flash and I don't understand most of it
(http://www.sandisk.com/pdf/oem/WPaperWearLevelv1.0.pdf) - but
what I do get is that they seem to be using 2,000,000 updates as
an expected failure boundary (and I think that is for a very
good compact flash - I think I have seen others that only
guarantee 10,000 writes).  So given that you are doing 3.5 million
inserts it may be that the hardware just is not going to be
able to support your application.

Now the actual number of I/O's to the disk is dependent on your
application: (size of buffer cache, size of rows, number of
checkpoints).

Any compact flash experts out there, is it possible (maybe even
likely), that after some minimum number of writes, a write to the
card could mess up a bit but not log any error on the write or
on the subsequent read.

Would you be able to recognize a bad bit in the data returned from
this page?  For instance on glancing at the page dump it looked like
the test char data was the same for every row.  If we get the
database we can probably hack a version that ignores the checksum
and returns the data, if the metadata on the page is all good then
it would be interesting to know if the data is all good.

Also could you post the create statement you used for the table.

I am not sure why but your property did not affect the table, as
the table is defaulting to 32k.  Some reasons could be that property
was not there when you created the table, or some error in the property
file or the property file not being found.


SBarboza@ILSTechnology.com wrote:

> The device is using a Compact Flash card as its persistent storage.
> The derby.properties file entries are as follows:
> 
> derby.storage.pageSize=8192
> derby.storage.tempDirectory=/root/xqjava/dbtmp
> 
> 
> The database is not encrypted.
> When loading the DB  there were no crashes.
> 
> I think we can provide you with the database image. Where should we send it
> to ?
>  I'd hate to clog up the discussion group with the image.
> 
> Thanks
> Sunil
> 
> 
> 
> |---------+---------------------------->
> |         |           Mike Matrigali   |
> |         |           <mikem_app@sbcglo|
> |         |           bal.net>         |
> |         |                            |
> |         |           09/08/2005 01:15 |
> |         |           PM               |
> |         |           Please respond to|
> |         |           "Derby           |
> |         |           Discussion"      |
> |---------+---------------------------->
>   >---------------------------------------------------------------------------------------------------------------------------------------------|
>   |                                                                                                                                             |
>   |       To:       Derby Discussion <de...@db.apache.org>                                                                                 |
>   |       cc:                                                                                                                                   |
>   |       Subject:  Re: 'Invalid checksum on Page' error                                                                                        |
>   >---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
> 
> 
> thanks for the more info.  Definitely interested if you can reproduce
> on different device.  I did a quick look at the page dump and on the
> surface nothing jumped out, the ascii dump of the data looks reasonable,
> there is a set of 0's in the middle as expected with a set of what looks
> like a reasonable page offset table at the end, the last page offset
> points at what looks like the last record.  Next step is to decode
> the actual values in stuff like the page hdrs, see if the zero's in the
> middle are right or if there is missing data pointed to by the offset
> table.
> 
> 
> Some more questions:
> o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
> o were you setting any non-default derby properties?
> o was this database encrypted?
> o When you were loading the db was there any crash encountered?
> 
> When you try to reproduce could you set the following property so that
> derby.log will have a complete record of any errors, by default it gets
> overwritten every time:
> http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html
> 
> If the data in your db is not sensitive would you be willing to provide
> it.  I realize it is probably very big, so I am not sure the best way.
> Derby db's do tend to compress well using standard zip.
> 
> SBarboza@ILSTechnology.com wrote:
> 
> 
>>The error is always on the same page ( 10031 ).
>>I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
>>about the page checksum error
>> that is listed in the derby.log.
>>I took a look at the OS logs but there was nothing that would indicate a
> 
> IO
> 
>>failure.
>>I am attaching the derby.log file.
>>
>>(See attached file: derby.log)
>>
>>I will run this scenario on several devices to try to recreate the
> 
> problem.
> 
>>
>>
>>
>>|---------+---------------------------->
>>|         |           Mike Matrigali   |
>>|         |           <mikem_app@sbcglo|
>>|         |           bal.net>         |
>>|         |                            |
>>|         |           09/07/2005 12:47 |
>>|         |           PM               |
>>|         |           Please respond to|
>>|         |           "Derby           |
>>|         |           Discussion"      |
>>|---------+---------------------------->
>>  >
> 
> ---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
>>  |
> 
> |
> 
>>  |       To:       Derby Discussion <de...@db.apache.org>
> 
> |
> 
>>  |       cc:
> 
> |
> 
>>  |       Subject:  Re: 'Invalid checksum on Page' error
> 
> |
> 
>>  >
> 
> ---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
>>
>>
>>
>>The most usual case that causes a bad checksum error is a
>>hardware problem on the data disk.  I have also seen OS I/O issues
>>where for some reason other data has been written into the derby
>>file.  Have you checked the OS log
>>to see if any errors are being generated?  Could you attach the
>>complete derby.log if it is not too big? Or if not could you at
>>least attach the complete error from this particular error - most
>>of the time the page dump won't help much but sometimes it is
>>interesting if there is something like all 0's in the end of
>>the page.
>>
>>It sounds like this problem on the disk and not a runtime error
>>from your description.  The current error is reporting an error
>>on page 10031, are all the errors you are seeing on the same page?
>>Running the following will check your table, and should report
>>the same error as encountered below if the problem is a persistent
>>on disk error:
>>http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
>>http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html
>>
>>The only supported way to recover from this error to apply a backup
>>if you have one, and if it was a roll forward backup it will bring
>>the database up to the current state.
>>
>>SBarboza@ILSTechnology.com wrote:
>>
>>
>>>Hi,
>>>       I have apache derby 10.0 running on a MontaVista Linux system
>>
>>(3.1
>>
>>
>>>Professional with Linux/i686 2.4.20)  in embedded mode using the
>>>EmbeddedConnectionPoolDataSource.
>>>The java level is Sun's jre 1.4.2_04.
>>>There are around 3.5 million records in a table in the DB.
>>>While adding the records I had  one thread inserting rows into this table
>>>at a rate of around 50 msecs.
>>>Another thread is periodically doing selects on this table and some
>>>deletes.
>>>
>>>When the record count was building up to 3.5 million , no deletes were
>>>being done on the table.
>>>I have the transaction log and db temp space in a different directory.
>>>
>>>When a thread attempts to delete a record from the table , it catches a
>>>SQLException with the following error message
>>>
>>>SQLError:0              SQLState:XJ001           SQLErrMsg:Java
>>
>>exception:
>>
>>
>>>': java.lang.NullPointerException'.
>>>
>>>
>>>The derby.log file (at the end of this posting )  indicates an invalid
>>>checksum on a page . I have only included the first few lines.
>>>This may have occurred when i was selecting data from the database.
>>>
>>>If I restart the application, I sometimes get the same SQLException on
>>
>>the
>>
>>
>>>thread that is inserting data , after a few succesful inserts.
>>>
>>>When I run the command line client (ij), I am able to select and delete
>>>records from this database.
>>>
>>>What would typically cause a checksum error to occur ? Is there a way to
>>>recover from it without losing data ?
>>>
>>>
>>>====================  Begin derby.log
>>>========================================================================
>>>------------  BEGIN SHUTDOWN ERROR STACK -------------
>>>
>>>ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
>>>expected=3,
>>>558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
>>>00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
>>>00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>>>
>>>
>>>The trailing stack trace is as follows:
>>>
>>>       at
>>>org.apache.derby.iapi.error.StandardException.newException(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
> 
>>>       at
>>>org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
>>>Source)
>>>       at org.apache.derby.impl.services.cache.Clock.find(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
> 
> 
>>
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
>>>Source)
>>>       at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>>>      ......
>>>
>>>------------  END SHUTDOWN ERROR STACK -------------
>>>
>>>2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
>>>action starting
>>>2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
>>>Statement is: INSERT
>>>INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
>>>java.lang.NullPointerException
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>
>>org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
>>
>>
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown
> 
> 
>>
>>>Source)
>>>       at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
> 
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown
> 
> 
>>
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
> 
>>>Source)
>>>       at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
>>>Source)
>>>       at
>>>org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
>>
>>Source)
>>
>>
>>>       at
>>>
>>
>>
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
> 
>>>Source)
>>>       at
>>>
>>
>>
> org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
> 
>>>Source)
>>>       at
>>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
>>>........
>>>Cleanup action completed
>>>
>>>==================== derby.log
>>>========================================================================
>>>
>>>
>>>Thanks in advance.
>>>Sunil.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
> 
> 
> 
> 
> 
> 

Re: 'Invalid checksum on Page' error

Posted by SB...@ILSTechnology.com.
The device is using a Compact Flash card as its persistent storage.
The derby.properties file entries are as follows:

derby.storage.pageSize=8192
derby.storage.tempDirectory=/root/xqjava/dbtmp


The database is not encrypted.
When loading the DB  there were no crashes.

I think we can provide you with the database image. Where should we send it
to ?
 I'd hate to clog up the discussion group with the image.

Thanks
Sunil



|---------+---------------------------->
|         |           Mike Matrigali   |
|         |           <mikem_app@sbcglo|
|         |           bal.net>         |
|         |                            |
|         |           09/08/2005 01:15 |
|         |           PM               |
|         |           Please respond to|
|         |           "Derby           |
|         |           Discussion"      |
|---------+---------------------------->
  >---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                             |
  |       To:       Derby Discussion <de...@db.apache.org>                                                                                 |
  |       cc:                                                                                                                                   |
  |       Subject:  Re: 'Invalid checksum on Page' error                                                                                        |
  >---------------------------------------------------------------------------------------------------------------------------------------------|




thanks for the more info.  Definitely interested if you can reproduce
on different device.  I did a quick look at the page dump and on the
surface nothing jumped out, the ascii dump of the data looks reasonable,
there is a set of 0's in the middle as expected with a set of what looks
like a reasonable page offset table at the end, the last page offset
points at what looks like the last record.  Next step is to decode
the actual values in stuff like the page hdrs, see if the zero's in the
middle are right or if there is missing data pointed to by the offset
table.


Some more questions:
o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
o were you setting any non-default derby properties?
o was this database encrypted?
o When you were loading the db was there any crash encountered?

When you try to reproduce could you set the following property so that
derby.log will have a complete record of any errors, by default it gets
overwritten every time:
http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html

If the data in your db is not sensitive would you be willing to provide
it.  I realize it is probably very big, so I am not sure the best way.
Derby db's do tend to compress well using standard zip.

SBarboza@ILSTechnology.com wrote:

> The error is always on the same page ( 10031 ).
> I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
> about the page checksum error
>  that is listed in the derby.log.
> I took a look at the OS logs but there was nothing that would indicate a
IO
> failure.
> I am attaching the derby.log file.
>
> (See attached file: derby.log)
>
> I will run this scenario on several devices to try to recreate the
problem.
>
>
>
>
> |---------+---------------------------->
> |         |           Mike Matrigali   |
> |         |           <mikem_app@sbcglo|
> |         |           bal.net>         |
> |         |                            |
> |         |           09/07/2005 12:47 |
> |         |           PM               |
> |         |           Please respond to|
> |         |           "Derby           |
> |         |           Discussion"      |
> |---------+---------------------------->
>   >
---------------------------------------------------------------------------------------------------------------------------------------------|

>   |
|
>   |       To:       Derby Discussion <de...@db.apache.org>
|
>   |       cc:
|
>   |       Subject:  Re: 'Invalid checksum on Page' error
|
>   >
---------------------------------------------------------------------------------------------------------------------------------------------|

>
>
>
>
> The most usual case that causes a bad checksum error is a
> hardware problem on the data disk.  I have also seen OS I/O issues
> where for some reason other data has been written into the derby
> file.  Have you checked the OS log
> to see if any errors are being generated?  Could you attach the
> complete derby.log if it is not too big? Or if not could you at
> least attach the complete error from this particular error - most
> of the time the page dump won't help much but sometimes it is
> interesting if there is something like all 0's in the end of
> the page.
>
> It sounds like this problem on the disk and not a runtime error
> from your description.  The current error is reporting an error
> on page 10031, are all the errors you are seeing on the same page?
> Running the following will check your table, and should report
> the same error as encountered below if the problem is a persistent
> on disk error:
> http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
> http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html
>
> The only supported way to recover from this error to apply a backup
> if you have one, and if it was a roll forward backup it will bring
> the database up to the current state.
>
> SBarboza@ILSTechnology.com wrote:
>
>>Hi,
>>        I have apache derby 10.0 running on a MontaVista Linux system
>
> (3.1
>
>>Professional with Linux/i686 2.4.20)  in embedded mode using the
>>EmbeddedConnectionPoolDataSource.
>>The java level is Sun's jre 1.4.2_04.
>>There are around 3.5 million records in a table in the DB.
>>While adding the records I had  one thread inserting rows into this table
>>at a rate of around 50 msecs.
>>Another thread is periodically doing selects on this table and some
>>deletes.
>>
>>When the record count was building up to 3.5 million , no deletes were
>>being done on the table.
>>I have the transaction log and db temp space in a different directory.
>>
>>When a thread attempts to delete a record from the table , it catches a
>>SQLException with the following error message
>>
>> SQLError:0              SQLState:XJ001           SQLErrMsg:Java
>
> exception:
>
>>': java.lang.NullPointerException'.
>>
>>
>>The derby.log file (at the end of this posting )  indicates an invalid
>>checksum on a page . I have only included the first few lines.
>>This may have occurred when i was selecting data from the database.
>>
>>If I restart the application, I sometimes get the same SQLException on
>
> the
>
>>thread that is inserting data , after a few succesful inserts.
>>
>>When I run the command line client (ij), I am able to select and delete
>>records from this database.
>>
>>What would typically cause a checksum error to occur ? Is there a way to
>>recover from it without losing data ?
>>
>>
>>====================  Begin derby.log
>>========================================================================
>>------------  BEGIN SHUTDOWN ERROR STACK -------------
>>
>>ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
>>expected=3,
>>558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
>>00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
>>00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>>
>>
>>The trailing stack trace is as follows:
>>
>>        at
>>org.apache.derby.iapi.error.StandardException.newException(Unknown
>
> Source)
>
>>        at
>>org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
>
>>        at
>>org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
>
> Source)
>
>>        at
>>org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
>>Source)
>>        at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
>>Source)
>>        at org.apache.derby.impl.services.cache.Clock.find(Unknown
>
> Source)
>
>>        at
>>org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
>>Source)
>>        at
>>org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
>
> Source)
>
>>        at
>>org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown

>
>
>> Source)
>>        at
>>
>
>
org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown

>
>
>> Source)
>>        at
>>
>
>
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown

>
>
>> Source)
>>        at
>>
>
>
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown

>
>
>> Source)
>>        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
>>Source)
>>        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>>       ......
>>
>>------------  END SHUTDOWN ERROR STACK -------------
>>
>>2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
>>action starting
>>2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
>>Statement is: INSERT
>>INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
>>java.lang.NullPointerException
>>        at
>>
>
>
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown

>
>
>> Source)
>>        at
>>
>
>
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown

>
>
>> Source)
>>        at
>
> org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
>
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown

>
>
>> Source)
>>        at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
>
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown

>
>
>> Source)
>>        at
>>org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
>>Source)
>>        at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
>
>>Source)
>>        at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
>>Source)
>>        at
>>org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
>
> Source)
>
>>        at
>>org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
>
> Source)
>
>>        at
>>
>
>
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
>
>>Source)
>>        at
>>
>
>
org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
>
>>Source)
>>        at
>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
>>........
>>Cleanup action completed
>>
>>==================== derby.log
>>========================================================================
>>
>>
>>Thanks in advance.
>>Sunil.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>






Re: 'Invalid checksum on Page' error

Posted by Mike Matrigali <mi...@sbcglobal.net>.
thanks for the more info.  Definitely interested if you can reproduce
on different device.  I did a quick look at the page dump and on the
surface nothing jumped out, the ascii dump of the data looks reasonable,
there is a set of 0's in the middle as expected with a set of what looks
like a reasonable page offset table at the end, the last page offset
points at what looks like the last record.  Next step is to decode
the actual values in stuff like the page hdrs, see if the zero's in the
middle are right or if there is missing data pointed to by the offset
table.


Some more questions:
o what kind of device was this error on (ie. IDE, SCSI, flash card, ...)
o were you setting any non-default derby properties?
o was this database encrypted?
o When you were loading the db was there any crash encountered?

When you try to reproduce could you set the following property so that
derby.log will have a complete record of any errors, by default it gets
overwritten every time:
http://db.apache.org/derby/docs/10.1/tuning/rtunproper13217.html

If the data in your db is not sensitive would you be willing to provide
it.  I realize it is probably very big, so I am not sure the best way.
Derby db's do tend to compress well using standard zip.

SBarboza@ILSTechnology.com wrote:

> The error is always on the same page ( 10031 ).
> I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
> about the page checksum error
>  that is listed in the derby.log.
> I took a look at the OS logs but there was nothing that would indicate a IO
> failure.
> I am attaching the derby.log file.
> 
> (See attached file: derby.log)
> 
> I will run this scenario on several devices to try to recreate the problem.
> 
> 
> 
> 
> |---------+---------------------------->
> |         |           Mike Matrigali   |
> |         |           <mikem_app@sbcglo|
> |         |           bal.net>         |
> |         |                            |
> |         |           09/07/2005 12:47 |
> |         |           PM               |
> |         |           Please respond to|
> |         |           "Derby           |
> |         |           Discussion"      |
> |---------+---------------------------->
>   >---------------------------------------------------------------------------------------------------------------------------------------------|
>   |                                                                                                                                             |
>   |       To:       Derby Discussion <de...@db.apache.org>                                                                                 |
>   |       cc:                                                                                                                                   |
>   |       Subject:  Re: 'Invalid checksum on Page' error                                                                                        |
>   >---------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
> 
> 
> The most usual case that causes a bad checksum error is a
> hardware problem on the data disk.  I have also seen OS I/O issues
> where for some reason other data has been written into the derby
> file.  Have you checked the OS log
> to see if any errors are being generated?  Could you attach the
> complete derby.log if it is not too big? Or if not could you at
> least attach the complete error from this particular error - most
> of the time the page dump won't help much but sometimes it is
> interesting if there is something like all 0's in the end of
> the page.
> 
> It sounds like this problem on the disk and not a runtime error
> from your description.  The current error is reporting an error
> on page 10031, are all the errors you are seeing on the same page?
> Running the following will check your table, and should report
> the same error as encountered below if the problem is a persistent
> on disk error:
> http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
> http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html
> 
> The only supported way to recover from this error to apply a backup
> if you have one, and if it was a roll forward backup it will bring
> the database up to the current state.
> 
> SBarboza@ILSTechnology.com wrote:
> 
>>Hi,
>>        I have apache derby 10.0 running on a MontaVista Linux system
> 
> (3.1
> 
>>Professional with Linux/i686 2.4.20)  in embedded mode using the
>>EmbeddedConnectionPoolDataSource.
>>The java level is Sun's jre 1.4.2_04.
>>There are around 3.5 million records in a table in the DB.
>>While adding the records I had  one thread inserting rows into this table
>>at a rate of around 50 msecs.
>>Another thread is periodically doing selects on this table and some
>>deletes.
>>
>>When the record count was building up to 3.5 million , no deletes were
>>being done on the table.
>>I have the transaction log and db temp space in a different directory.
>>
>>When a thread attempts to delete a record from the table , it catches a
>>SQLException with the following error message
>>
>> SQLError:0              SQLState:XJ001           SQLErrMsg:Java
> 
> exception:
> 
>>': java.lang.NullPointerException'.
>>
>>
>>The derby.log file (at the end of this posting )  indicates an invalid
>>checksum on a page . I have only included the first few lines.
>>This may have occurred when i was selecting data from the database.
>>
>>If I restart the application, I sometimes get the same SQLException on
> 
> the
> 
>>thread that is inserting data , after a few succesful inserts.
>>
>>When I run the command line client (ij), I am able to select and delete
>>records from this database.
>>
>>What would typically cause a checksum error to occur ? Is there a way to
>>recover from it without losing data ?
>>
>>
>>====================  Begin derby.log
>>========================================================================
>>------------  BEGIN SHUTDOWN ERROR STACK -------------
>>
>>ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
>>expected=3,
>>558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
>>00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
>>00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>>
>>
>>The trailing stack trace is as follows:
>>
>>        at
>>org.apache.derby.iapi.error.StandardException.newException(Unknown
> 
> Source)
> 
>>        at
>>org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
> 
>>        at
>>org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
> 
> Source)
> 
>>        at
>>org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
>>Source)
>>        at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
>>Source)
>>        at org.apache.derby.impl.services.cache.Clock.find(Unknown
> 
> Source)
> 
>>        at
>>org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
>>Source)
>>        at
>>org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
> 
> Source)
> 
>>        at
>>org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
> 
> 
>> Source)
>>        at
>>
> 
> org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
> 
> 
>> Source)
>>        at
>>
> 
> org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
> 
> 
>> Source)
>>        at
>>
> 
> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
> 
> 
>> Source)
>>        at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
>>Source)
>>        at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>>       ......
>>
>>------------  END SHUTDOWN ERROR STACK -------------
>>
>>2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
>>action starting
>>2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
>>(SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
>>Statement is: INSERT
>>INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
>>java.lang.NullPointerException
>>        at
>>
> 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
> 
> 
>> Source)
>>        at
>>
> 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
> 
> 
>> Source)
>>        at
> 
> org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
> 
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown
> 
> 
>> Source)
>>        at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
> 
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown
> 
> 
>> Source)
>>        at
>>org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
>>Source)
>>        at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
> 
>>Source)
>>        at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
>>Source)
>>        at
>>org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
> 
> Source)
> 
>>        at
>>org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
> 
> Source)
> 
>>        at
>>
> 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
> 
>>Source)
>>        at
>>
> 
> org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
> 
>>Source)
>>        at
>>org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
>>........
>>Cleanup action completed
>>
>>==================== derby.log
>>========================================================================
>>
>>
>>Thanks in advance.
>>Sunil.
>>
>>
>>
>>
>>
>>
>>
>>
>>
> 
> 
> 
> 

Re: 'Invalid checksum on Page' error

Posted by SB...@ILSTechnology.com.
The error is always on the same page ( 10031 ).
I ran the SYSCS_CHECK_TABLE command and I get the same error displayed
about the page checksum error
 that is listed in the derby.log.
I took a look at the OS logs but there was nothing that would indicate a IO
failure.
I am attaching the derby.log file.

(See attached file: derby.log)

I will run this scenario on several devices to try to recreate the problem.




|---------+---------------------------->
|         |           Mike Matrigali   |
|         |           <mikem_app@sbcglo|
|         |           bal.net>         |
|         |                            |
|         |           09/07/2005 12:47 |
|         |           PM               |
|         |           Please respond to|
|         |           "Derby           |
|         |           Discussion"      |
|---------+---------------------------->
  >---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                                             |
  |       To:       Derby Discussion <de...@db.apache.org>                                                                                 |
  |       cc:                                                                                                                                   |
  |       Subject:  Re: 'Invalid checksum on Page' error                                                                                        |
  >---------------------------------------------------------------------------------------------------------------------------------------------|




The most usual case that causes a bad checksum error is a
hardware problem on the data disk.  I have also seen OS I/O issues
where for some reason other data has been written into the derby
file.  Have you checked the OS log
to see if any errors are being generated?  Could you attach the
complete derby.log if it is not too big? Or if not could you at
least attach the complete error from this particular error - most
of the time the page dump won't help much but sometimes it is
interesting if there is something like all 0's in the end of
the page.

It sounds like this problem on the disk and not a runtime error
from your description.  The current error is reporting an error
on page 10031, are all the errors you are seeing on the same page?
Running the following will check your table, and should report
the same error as encountered below if the problem is a persistent
on disk error:
http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html

The only supported way to recover from this error to apply a backup
if you have one, and if it was a roll forward backup it will bring
the database up to the current state.

SBarboza@ILSTechnology.com wrote:
> Hi,
>         I have apache derby 10.0 running on a MontaVista Linux system
(3.1
> Professional with Linux/i686 2.4.20)  in embedded mode using the
> EmbeddedConnectionPoolDataSource.
> The java level is Sun's jre 1.4.2_04.
> There are around 3.5 million records in a table in the DB.
> While adding the records I had  one thread inserting rows into this table
> at a rate of around 50 msecs.
> Another thread is periodically doing selects on this table and some
> deletes.
>
> When the record count was building up to 3.5 million , no deletes were
> being done on the table.
> I have the transaction log and db temp space in a different directory.
>
> When a thread attempts to delete a record from the table , it catches a
> SQLException with the following error message
>
>  SQLError:0              SQLState:XJ001           SQLErrMsg:Java
exception:
> ': java.lang.NullPointerException'.
>
>
> The derby.log file (at the end of this posting )  indicates an invalid
> checksum on a page . I have only included the first few lines.
> This may have occurred when i was selecting data from the database.
>
> If I restart the application, I sometimes get the same SQLException on
the
> thread that is inserting data , after a few succesful inserts.
>
> When I run the command line client (ij), I am able to select and delete
> records from this database.
>
> What would typically cause a checksum error to occur ? Is there a way to
> recover from it without losing data ?
>
>
> ====================  Begin derby.log
> ========================================================================
> ------------  BEGIN SHUTDOWN ERROR STACK -------------
>
> ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
> expected=3,
> 558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
> 00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
> 00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
>
>
> The trailing stack trace is as follows:
>
>         at
> org.apache.derby.iapi.error.StandardException.newException(Unknown
Source)
>         at
> org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
> Source)
>         at
>
org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
>         at
> org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown
Source)
>         at
> org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
> Source)
>         at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
> Source)
>         at org.apache.derby.impl.services.cache.Clock.find(Unknown
Source)
>         at
> org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
> Source)
>         at
> org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown
Source)
>         at
> org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
> Source)
>         at
>
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown

>  Source)
>         at
>
org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown

>  Source)
>         at
>
org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown

>  Source)
>         at
>
org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown

>  Source)
>         at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
> Source)
>         at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>        ......
>
> ------------  END SHUTDOWN ERROR STACK -------------
>
> 2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
> (SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
> action starting
> 2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
> (SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
> Statement is: INSERT
> INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
> java.lang.NullPointerException
>         at
>
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown

>  Source)
>         at
>
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown

>  Source)
>         at
org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
> Source)
>         at
>
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown

>  Source)
>         at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
> Source)
>         at
>
org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
> Source)
>         at
>
org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown

>  Source)
>         at
> org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
> Source)
>         at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
> Source)
>         at
>
org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
> Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
> Source)
>         at
> org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
Source)
>         at
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
Source)
>         at
>
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
> Source)
>         at
>
org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
> Source)
>         at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
> ........
> Cleanup action completed
>
> ==================== derby.log
> ========================================================================
>
>
> Thanks in advance.
> Sunil.
>
>
>
>
>
>
>
>
>




Re: 'Invalid checksum on Page' error

Posted by Mike Matrigali <mi...@sbcglobal.net>.
The most usual case that causes a bad checksum error is a
hardware problem on the data disk.  I have also seen OS I/O issues
where for some reason other data has been written into the derby
file.  Have you checked the OS log
to see if any errors are being generated?  Could you attach the
complete derby.log if it is not too big? Or if not could you at
least attach the complete error from this particular error - most
of the time the page dump won't help much but sometimes it is
interesting if there is something like all 0's in the end of
the page.

It sounds like this problem on the disk and not a runtime error
from your description.  The current error is reporting an error
on page 10031, are all the errors you are seeing on the same page?
Running the following will check your table, and should report
the same error as encountered below if the problem is a persistent
on disk error:
http://db.apache.org/derby/docs/10.1/ref/rrefsyscschecktablefunc.html
http://db.apache.org/derby/docs/10.1/adminguide/cadminconsist01.html

The only supported way to recover from this error to apply a backup
if you have one, and if it was a roll forward backup it will bring
the database up to the current state.

SBarboza@ILSTechnology.com wrote:
> Hi,
>         I have apache derby 10.0 running on a MontaVista Linux system (3.1
> Professional with Linux/i686 2.4.20)  in embedded mode using the
> EmbeddedConnectionPoolDataSource.
> The java level is Sun's jre 1.4.2_04.
> There are around 3.5 million records in a table in the DB.
> While adding the records I had  one thread inserting rows into this table
> at a rate of around 50 msecs.
> Another thread is periodically doing selects on this table and some
> deletes.
> 
> When the record count was building up to 3.5 million , no deletes were
> being done on the table.
> I have the transaction log and db temp space in a different directory.
> 
> When a thread attempts to delete a record from the table , it catches a
> SQLException with the following error message
> 
>  SQLError:0              SQLState:XJ001           SQLErrMsg:Java exception:
> ': java.lang.NullPointerException'.
> 
> 
> The derby.log file (at the end of this posting )  indicates an invalid
> checksum on a page . I have only included the first few lines.
> This may have occurred when i was selecting data from the database.
> 
> If I restart the application, I sometimes get the same SQLException on the
> thread that is inserting data , after a few succesful inserts.
> 
> When I run the command line client (ij), I am able to select and delete
> records from this database.
> 
> What would typically cause a checksum error to occur ? Is there a way to
> recover from it without losing data ?
> 
> 
> ====================  Begin derby.log
> ========================================================================
> ------------  BEGIN SHUTDOWN ERROR STACK -------------
> 
> ERROR XSDG2: Invalid checksum on Page Page(10031,Container(0, 800)),
> expected=3,
> 558,849,496, on-disk version=772,832,532, page dump follows: Hex dump:
> 00000000: 0075 0000 0001 0000 0000 0000 003d 003c  .u..............
> 00000010: 0000 0042 0000 0000 0000 0000 0000 0000  ...B............
> 
> 
> The trailing stack trace is as follows:
> 
>         at
> org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
>         at
> org.apache.derby.impl.store.raw.data.StoredPage.validateChecksum(Unknown
> Source)
>         at
> org.apache.derby.impl.store.raw.data.StoredPage.initFromData(UnknownSource)
>         at
> org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at
> org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(Unknown
> Source)
>         at org.apache.derby.impl.services.cache.Clock.addEntry(Unknown
> Source)
>         at org.apache.derby.impl.services.cache.Clock.find(Unknown Source)
>         at
> org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown
> Source)
>         at
> org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown Source)
>         at
> org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown
> Source)
>         at
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown
>  Source)
>         at
> org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown
>  Source)
>         at
> org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown
>  Source)
>         at
> org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(Unknown
>  Source)
>         at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown
> Source)
>         at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source)
>        ......
> 
> ------------  END SHUTDOWN ERROR STACK -------------
> 
> 2005-09-07 13:54:01.041 GMT Thread[Thread-2,5,main] (XID = 2985973),
> (SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Cleanup
> action starting
> 2005-09-07 13:54:01.042 GMT Thread[Thread-2,5,main] (XID = 2985973),
> (SESSIONID= 1), (DATABASE = /xqjava/db/SAF), (DRDAID = null), Failed
> Statement is: INSERT
> INTO  messages_1 ( msg_id, msg_timestamp, msg) VALUES (?,?,?)
> java.lang.NullPointerException
>         at
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
>         at
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
>         at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown
> Source)
>         at
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown
>  Source)
>         at org.apache.derby.impl.store.access.heap.Heap.open(Unknown
> Source)
>         at
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown
> Source)
>         at
> org.apache.derby.impl.store.access.RAMTransaction.openCompiledConglomerate(Unknown
>  Source)
>         at
> org.apache.derby.impl.sql.execute.RowChangerImpl.openForUpdate(Unknown
> Source)
>         at org.apache.derby.impl.sql.execute.RowChangerImpl.open(Unknown
> Source)
>         at
> org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(Unknown
> Source)
>         at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown
> Source)
>         at
> org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
>         at
> org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
>         at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown
> Source)
>         at
> org.apache.derby.impl.jdbc.EmbedCallableStatement.executeStatement(Unknown
> Source)
>         at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
> ........
> Cleanup action completed
> 
> ==================== derby.log
> ========================================================================
> 
> 
> Thanks in advance.
> Sunil.
> 
> 
> 
> 
> 
> 
> 
> 
>