You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Emilio Miguelanez <em...@gmail.com> on 2013/01/27 00:29:38 UTC

TDB: records not strictly increasing

Hi,

I have build an application with a TDB storage, using version 0.8.10, which has now a considerable size (>1Gb) after it has been running for some time.

However, recently I have facing some errors, detailed as

com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 0000000006d00261000000000000021c0000000006cfff69 // 0000000006b861a3000000000005233d00000000015a78b5

I don't know what has caused this error, but I suspect that the storage (and data) is now corrupted somehow.

After reading the mailing lists, this error has already been reported (https://issues.apache.org/jira/browse/JENA-301) and fixed in the latest version of the TDB (0.9.4), which I'll start using right away.

However, I really need to fix my existing TDB storage. Any ideas how I can fix it?

Regards,
Emilio

--
Emilio Migueláñez Martín
emilio.miguelanez@gmail.com

Re: TDB: records not strictly increasing

Posted by Andy Seaborne <an...@apache.org>.

On 29/01/13 10:42, Emilio Miguelanez wrote:
> Andy,
>
> I have run a couple of query tests, to see if I get same number of triples on both instances of tdb
>
> select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>}
> select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Job>}
> select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Symptom>}
>
> So far, same results and so loss of data
>
> I have also run the query that generates StorageException:RecordRangeIterator error, and it works fine in the new tdb.
>
> So, should I assume that the new tdb is fixed now? Should I run any other tests?

The rebuilt DB wil be structurally OK and _should_ have all the data. 
This sort of recovery is not absolutely perfect ... and you're the first 
to use it.

It is well worth checking extensively.

>
> Using transactions would prevent for this problem to happen again? (Remember, I was using version 0.8.10)

The file format has not changed 0.8 -> 0.9.  You can just start using 
0.9.X code and the transaction API - no need to rebuild the DB.

You can even go backwards with care (run tdbrecovery to flush the 
journal to main DB just in case).  But transactions are there to make 
the recovery and reliability better.  You can use 0.9 code in 0.8 
non-transactional style (with the same attendant risks but it can easy 
migration).

	Andy

>
> Thanks for your help.
>
> Regards,
> Emilio
>
> On 29 Jan 2013, at 08:26, Andy Seaborne wrote:
>
>>
>>>> B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena.
>>>>
>>>> These instructions apply to code in SVN as it is now, today.  Not the last release, not last week.  It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature.
>>>>
>>>> 1/ svn co the Jena code from trunk.
>>>>
>>> Done
>>>> 2/ Build Jena
>>>>    mvn clean install
>>>>
>>> Done
>>>> It is easier to build and install than just package.
>>>>
>>>> You must use the development releases of the other modules.
>>>> I don't think you need to set up maven to use the snapshot builds on Apache but if you do:
>>>>
>>>> Set <repository>
>>>> http://jena.apache.org/download/maven.html
>>>>
>>>> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code.
>>> Didn't set up maven or use Eclipse.
>>>
>>>> 4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.
>>>>
>>>> For added ease - use the Fuseki server jar which as everything in it
>>>>
>>>> java -cp fuseki-server.jar tdb.tdbdump —version
>>>
>>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump —version
>>>
>>> Jena:       VERSION: 2.10.0-SNAPSHOT
>>> Jena:       BUILD_DATE: 2013-01-28T21:00:30+0000
>>> ARQ:        VERSION: 2.10.0-SNAPSHOT
>>> ARQ:        BUILD_DATE: 2013-01-28T21:00:30+0000
>>> TDB:        VERSION: 0.10.0-SNAPSHOT
>>> TDB:        BUILD_DATE: 2013-01-28T21:00:30+0000
>>>
>>>> Check timestamps/version numbers.
>>>>
>>>> 5/ Test create a small text file of a few triples.
>>>>
>>>> --- D.ttl
>>>> @prefix : <http://example/> .
>>>>
>>>> :s1 :p 1 .
>>>> :s2 :p 2 .
>>>> :s3 :q 3 .
>>>> :s2 :q 4 .
>>>> :s1 :q 5 .
>>>>
>>>> ---
>>>>
>>>> tdbdump --data D.ttl should dump the file with triples clustered by subject.
>>>>
>>>> (no - you do not need to load a database - --data is a recent feature for testing)
>>>
>>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl
>>> <http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>>
>>>> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method "chooseScanAllIndex"
>>>>
>>>> Change:
>>>> -----
>>>>         if ( tupleLen != 4 )
>>>>             return indexes[0] ;
>>>> ==>
>>>>         if ( tupleLen != 4 )
>>>>         {
>>>>             if ( indexes.length == 3 )
>>>>                 return indexes[1] ;
>>>>             else
>>>>                 return indexes[0] ;
>>>>         }
>>>> -----
>>>>
>>>> 7/ Rebuild.
>>>>
>>>> Yes - the tests for TDB should pass!
>>>>
>>>> 8/ check the new version
>>>>
>>>> tdbdump --version
>>>>
>>>> check the change
>>>>
>>>> tdbdump --data D.ttl
>>>>
>>>> and it should be n-triples clustered by property, different to earlier on.
>>>
>>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl
>>> <http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>> <http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>>>
>>> Is it what you expect?
>>
>> Yes.
>>
>>>
>>>>
>>>> 9/ Dump your database.
>>>>
>>>> Hope there is a good index.
>>>
>>> It works and no errors were reported, however the size of the dump file is just 84MB, which is considerable smaller than the actual tdb (~1GB)
>>
>> Quite possible - especially if you have also been deleting stuff in the database as well as adding.
>>
>>>
>>>> You can also try indexes[2] not indexes[1] to use the OSP index.
>>>> Each dumps the entire database, but in different triple orders.
>>>
>>> I did also try this changes of indexes, and it gave me the same error
>>>
>>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 00000000021aa0a20000000006cffe6b000000000005233d // 00000000021a2c0a0000000006b85f9f000000000005233d
>>
>> The OSP index is also broken.
>>
>>>
>>>> 10/ Clean up maven to get rid of the temporary build.
>>>>
>>>> rm -r REPO/org/apache/jena/
>>>>
>>>> 11/ Rebuild the database with tdbloader/tdbloader2.
>>>
>>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbloader --loc=tdb tdb.dump
>>>
>>> but the size of the tdb is smaller than the original tdb
>>
>> The loader produces more compact indexes than if the data has been loaded incrementally.  This is even more the case for tdblaoder2.
>>
>> Also if you have been deleting and adding, for 0.8, then the database can grow.  This is addressed, but not totlally fixed in 0.9.X
>>
>>>> (the load is slower than if dumped in SPO order)
>>>>
>>>> I tested the change here on that test file - I don't have a large corrupt database to try it on.
>>>>
>>>>> Any ideas of how to get it fixed are more than welcome.
>>>>
>>>> Personally, I would adopt a 2 stream approach.
>>>>
>>>> Do approach above and also collect all the data together and start a fresh load of the database on another machine.
>>>
>>> Doing it already.
>>
>> 	Andy
>>
>>>
>>> Thanks,
>>> Emilio
>>>
>>>>
>>>> 	Good luck
>>>> 	Andy
>>>>
>>>>>
>>>>> Regards, Emilio
>>>>>
>>>>>
>>>>> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Emilio Migueláñez Martín
>>> emilio.miguelanez@gmail.com
>>>
>>>
>>
>
> --
> Emilio Migueláñez Martín
> emilio.miguelanez@gmail.com
>
>

Re: TDB: records not strictly increasing

Posted by Emilio Miguelanez <em...@gmail.com>.

Andy,

I have run a couple of query tests, to see if I get same number of triples on both instances of tdb

select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>}
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Job>}
select count(*) { ?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Symptom>}

So far, same results and so loss of data

I have also run the query that generates StorageException:RecordRangeIterator error, and it works fine in the new tdb.

So, should I assume that the new tdb is fixed now? Should I run any other tests?

Using transactions would prevent for this problem to happen again? (Remember, I was using version 0.8.10)

Thanks for your help.

Regards,
Emilio

On 29 Jan 2013, at 08:26, Andy Seaborne wrote:

> 
>>> B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena.
>>> 
>>> These instructions apply to code in SVN as it is now, today.  Not the last release, not last week.  It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature.
>>> 
>>> 1/ svn co the Jena code from trunk.
>>> 
>> Done
>>> 2/ Build Jena
>>>   mvn clean install
>>> 
>> Done
>>> It is easier to build and install than just package.
>>> 
>>> You must use the development releases of the other modules.
>>> I don't think you need to set up maven to use the snapshot builds on Apache but if you do:
>>> 
>>> Set <repository>
>>> http://jena.apache.org/download/maven.html
>>> 
>>> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code.
>> Didn't set up maven or use Eclipse.
>> 
>>> 4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.
>>> 
>>> For added ease - use the Fuseki server jar which as everything in it
>>> 
>>> java -cp fuseki-server.jar tdb.tdbdump —version
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump —version
>> 
>> Jena:       VERSION: 2.10.0-SNAPSHOT
>> Jena:       BUILD_DATE: 2013-01-28T21:00:30+0000
>> ARQ:        VERSION: 2.10.0-SNAPSHOT
>> ARQ:        BUILD_DATE: 2013-01-28T21:00:30+0000
>> TDB:        VERSION: 0.10.0-SNAPSHOT
>> TDB:        BUILD_DATE: 2013-01-28T21:00:30+0000
>> 
>>> Check timestamps/version numbers.
>>> 
>>> 5/ Test create a small text file of a few triples.
>>> 
>>> --- D.ttl
>>> @prefix : <http://example/> .
>>> 
>>> :s1 :p 1 .
>>> :s2 :p 2 .
>>> :s3 :q 3 .
>>> :s2 :q 4 .
>>> :s1 :q 5 .
>>> 
>>> ---
>>> 
>>> tdbdump --data D.ttl should dump the file with triples clustered by subject.
>>> 
>>> (no - you do not need to load a database - --data is a recent feature for testing)
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl
>> <http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> 
>>> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method "chooseScanAllIndex"
>>> 
>>> Change:
>>> -----
>>>        if ( tupleLen != 4 )
>>>            return indexes[0] ;
>>> ==>
>>>        if ( tupleLen != 4 )
>>>        {
>>>            if ( indexes.length == 3 )
>>>                return indexes[1] ;
>>>            else
>>>                return indexes[0] ;
>>>        }
>>> -----
>>> 
>>> 7/ Rebuild.
>>> 
>>> Yes - the tests for TDB should pass!
>>> 
>>> 8/ check the new version
>>> 
>>> tdbdump --version
>>> 
>>> check the change
>>> 
>>> tdbdump --data D.ttl
>>> 
>>> and it should be n-triples clustered by property, different to earlier on.
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl
>> <http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> <http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>> 
>> Is it what you expect?
> 
> Yes.
> 
>> 
>>> 
>>> 9/ Dump your database.
>>> 
>>> Hope there is a good index.
>> 
>> It works and no errors were reported, however the size of the dump file is just 84MB, which is considerable smaller than the actual tdb (~1GB)
> 
> Quite possible - especially if you have also been deleting stuff in the database as well as adding.
> 
>> 
>>> You can also try indexes[2] not indexes[1] to use the OSP index.
>>> Each dumps the entire database, but in different triple orders.
>> 
>> I did also try this changes of indexes, and it gave me the same error
>> 
>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 00000000021aa0a20000000006cffe6b000000000005233d // 00000000021a2c0a0000000006b85f9f000000000005233d
> 
> The OSP index is also broken.
> 
>> 
>>> 10/ Clean up maven to get rid of the temporary build.
>>> 
>>> rm -r REPO/org/apache/jena/
>>> 
>>> 11/ Rebuild the database with tdbloader/tdbloader2.
>> 
>> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbloader --loc=tdb tdb.dump
>> 
>> but the size of the tdb is smaller than the original tdb
> 
> The loader produces more compact indexes than if the data has been loaded incrementally.  This is even more the case for tdblaoder2.
> 
> Also if you have been deleting and adding, for 0.8, then the database can grow.  This is addressed, but not totlally fixed in 0.9.X
> 
>>> (the load is slower than if dumped in SPO order)
>>> 
>>> I tested the change here on that test file - I don't have a large corrupt database to try it on.
>>> 
>>>> Any ideas of how to get it fixed are more than welcome.
>>> 
>>> Personally, I would adopt a 2 stream approach.
>>> 
>>> Do approach above and also collect all the data together and start a fresh load of the database on another machine.
>> 
>> Doing it already.
> 
> 	Andy
> 
>> 
>> Thanks,
>> Emilio
>> 
>>> 
>>> 	Good luck
>>> 	Andy
>>> 
>>>> 
>>>> Regards, Emilio
>>>> 
>>>> 
>>>> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> --
>> Emilio Migueláñez Martín
>> emilio.miguelanez@gmail.com
>> 
>> 
> 

--
Emilio Migueláñez Martín
emilio.miguelanez@gmail.com

Re: TDB: records not strictly increasing

Posted by Andy Seaborne <an...@apache.org>.

>> B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena.
>>
>> These instructions apply to code in SVN as it is now, today.  Not the last release, not last week.  It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature.
>>
>> 1/ svn co the Jena code from trunk.
>>
> Done
>> 2/ Build Jena
>>    mvn clean install
>>
> Done
>> It is easier to build and install than just package.
>>
>> You must use the development releases of the other modules.
>> I don't think you need to set up maven to use the snapshot builds on Apache but if you do:
>>
>> Set <repository>
>> http://jena.apache.org/download/maven.html
>>
>> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code.
> Didn't set up maven or use Eclipse.
>
>> 4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.
>>
>> For added ease - use the Fuseki server jar which as everything in it
>>
>> java -cp fuseki-server.jar tdb.tdbdump —version
>
> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump —version
>
> Jena:       VERSION: 2.10.0-SNAPSHOT
> Jena:       BUILD_DATE: 2013-01-28T21:00:30+0000
> ARQ:        VERSION: 2.10.0-SNAPSHOT
> ARQ:        BUILD_DATE: 2013-01-28T21:00:30+0000
> TDB:        VERSION: 0.10.0-SNAPSHOT
> TDB:        BUILD_DATE: 2013-01-28T21:00:30+0000
>
>> Check timestamps/version numbers.
>>
>> 5/ Test create a small text file of a few triples.
>>
>> --- D.ttl
>> @prefix : <http://example/> .
>>
>> :s1 :p 1 .
>> :s2 :p 2 .
>> :s3 :q 3 .
>> :s2 :q 4 .
>> :s1 :q 5 .
>>
>> ---
>>
>> tdbdump --data D.ttl should dump the file with triples clustered by subject.
>>
>> (no - you do not need to load a database - --data is a recent feature for testing)
>
> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl
> <http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
>
>> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method "chooseScanAllIndex"
>>
>> Change:
>> -----
>>         if ( tupleLen != 4 )
>>             return indexes[0] ;
>> ==>
>>         if ( tupleLen != 4 )
>>         {
>>             if ( indexes.length == 3 )
>>                 return indexes[1] ;
>>             else
>>                 return indexes[0] ;
>>         }
>> -----
>>
>> 7/ Rebuild.
>>
>> Yes - the tests for TDB should pass!
>>
>> 8/ check the new version
>>
>> tdbdump --version
>>
>> check the change
>>
>> tdbdump --data D.ttl
>>
>> and it should be n-triples clustered by property, different to earlier on.
>
> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl
> <http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
> <http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
>
> Is it what you expect?

Yes.

>
>>
>> 9/ Dump your database.
>>
>> Hope there is a good index.
>
> It works and no errors were reported, however the size of the dump file is just 84MB, which is considerable smaller than the actual tdb (~1GB)

Quite possible - especially if you have also been deleting stuff in the 
database as well as adding.

>
>> You can also try indexes[2] not indexes[1] to use the OSP index.
>> Each dumps the entire database, but in different triple orders.
>
> I did also try this changes of indexes, and it gave me the same error
>
> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 00000000021aa0a20000000006cffe6b000000000005233d // 00000000021a2c0a0000000006b85f9f000000000005233d

The OSP index is also broken.

>
>> 10/ Clean up maven to get rid of the temporary build.
>>
>> rm -r REPO/org/apache/jena/
>>
>> 11/ Rebuild the database with tdbloader/tdbloader2.
>
> java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbloader --loc=tdb tdb.dump
>
> but the size of the tdb is smaller than the original tdb

The loader produces more compact indexes than if the data has been 
loaded incrementally.  This is even more the case for tdblaoder2.

Also if you have been deleting and adding, for 0.8, then the database 
can grow.  This is addressed, but not totlally fixed in 0.9.X

>> (the load is slower than if dumped in SPO order)
>>
>> I tested the change here on that test file - I don't have a large corrupt database to try it on.
>>
>>> Any ideas of how to get it fixed are more than welcome.
>>
>> Personally, I would adopt a 2 stream approach.
>>
>> Do approach above and also collect all the data together and start a fresh load of the database on another machine.
>
> Doing it already.

	Andy

>
> Thanks,
> Emilio
>
>>
>> 	Good luck
>> 	Andy
>>
>>>
>>> Regards, Emilio
>>>
>>>
>>> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>>>
>>>
>>>
>>
>
> --
> Emilio Migueláñez Martín
> emilio.miguelanez@gmail.com
>
>

Re: TDB: records not strictly increasing

Posted by Emilio Miguelanez <em...@gmail.com>.

Hi Andy,

I have done some testing.

> On 28/01/13 10:21, Emilio Miguelanez wrote:
>> 
>> On 27 Jan 2013, at 22:04, Andy Seaborne wrote:
>> 
>>> If select * { ?agent
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
>>> 
>>> 
> }
>>> 
>>> works, it may be your lucky day.  The SPO index is intact so
>>> tdbdump will work.  Maybe.
>>> 
>>> If you have the original data, then rebuilding is much safer.
>>> There may be other problems not yet encountered.
>> 
>> 
>> This query works .... what should I do now?
>> 
>> If I run
>> 
>> tdbdump --loc=tdb > tdb.dump       (question: tdbdump are tdbbackup
>> are same commands?)
> 
> Almost.
> 
>> I get same error.
> 
> Not your lucky day I'm afraid.  The SPO index is damaged.  It does however look as if another index is intact.
> 
>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
>> RecordRangeIterator: records not strictly increasing:
>> 0000000006d00261000000000000021c0000000006cfff69 //
>> 0000000006b861a3000000000005233d00000000015a78b5
>> 
>> I would like to try if the current tdb can be fixed, as rebuilding
>> could take long time. The database was created  with minimal data,
>> and it is being populated (dynamically) with data over a long period
>> of time (> 1 year)
> 
> SPO is the index used for iteration of the whole database.  This can be changed.
> 
> Is this a database of just triples? No named graphs?  So far, the corruption looks to be in SPO (an index on the default graph).

The database started with a named graph, and is being populated with triples over time. 

> It will take some programming to fix this.  No guarantees that it will work but I've experimented here.
> 
> Take a backup of the database.

Done

> 
> A/ (the second way is better)

I haven't tested this approach.

> If you know all the possible properties, then write code that loops on each of the properties and does
> 
>   defaultGraph.find(null, property, null)
> 
> This will use the POS index.
> 
> Print everything in N-Triples.
> 
> B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena.
> 
> These instructions apply to code in SVN as it is now, today.  Not the last release, not last week.  It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature.
> 
> 1/ svn co the Jena code from trunk.
> 
Done
> 2/ Build Jena
>   mvn clean install
> 
Done
> It is easier to build and install than just package.
> 
> You must use the development releases of the other modules.
> I don't think you need to set up maven to use the snapshot builds on Apache but if you do:
> 
> Set <repository>
> http://jena.apache.org/download/maven.html
> 
> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code.
Didn't set up maven or use Eclipse.

> 4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.
> 
> For added ease - use the Fuseki server jar which as everything in it
> 
> java -cp fuseki-server.jar tdb.tdbdump —version

java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump —version

Jena:       VERSION: 2.10.0-SNAPSHOT
Jena:       BUILD_DATE: 2013-01-28T21:00:30+0000
ARQ:        VERSION: 2.10.0-SNAPSHOT
ARQ:        BUILD_DATE: 2013-01-28T21:00:30+0000
TDB:        VERSION: 0.10.0-SNAPSHOT
TDB:        BUILD_DATE: 2013-01-28T21:00:30+0000

> Check timestamps/version numbers.
> 
> 5/ Test create a small text file of a few triples.
> 
> --- D.ttl
> @prefix : <http://example/> .
> 
> :s1 :p 1 .
> :s2 :p 2 .
> :s3 :q 3 .
> :s2 :q 4 .
> :s1 :q 5 .
> 
> ---
> 
> tdbdump --data D.ttl should dump the file with triples clustered by subject.
> 
> (no - you do not need to load a database - --data is a recent feature for testing)

java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl 
<http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .

> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method "chooseScanAllIndex"
> 
> Change:
> -----
>        if ( tupleLen != 4 )
>            return indexes[0] ;
> ==>
>        if ( tupleLen != 4 )
>        {
>            if ( indexes.length == 3 )
>                return indexes[1] ;
>            else
>                return indexes[0] ;
>        }
> -----
> 
> 7/ Rebuild.
> 
> Yes - the tests for TDB should pass!
> 
> 8/ check the new version
> 
> tdbdump --version
> 
> check the change
> 
> tdbdump --data D.ttl
> 
> and it should be n-triples clustered by property, different to earlier on.

java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl 
<http://example/s1> <http://example/p> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/p> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s3> <http://example/q> "3"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s2> <http://example/q> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example/s1> <http://example/q> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .

Is it what you expect?

> 
> 9/ Dump your database.
> 
> Hope there is a good index.

It works and no errors were reported, however the size of the dump file is just 84MB, which is considerable smaller than the actual tdb (~1GB)

> You can also try indexes[2] not indexes[1] to use the OSP index.
> Each dumps the entire database, but in different triple orders.

I did also try this changes of indexes, and it gave me the same error

Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 00000000021aa0a20000000006cffe6b000000000005233d // 00000000021a2c0a0000000006b85f9f000000000005233d

> 10/ Clean up maven to get rid of the temporary build.
> 
> rm -r REPO/org/apache/jena/
> 
> 11/ Rebuild the database with tdbloader/tdbloader2.

java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbloader --loc=tdb tdb.dump

but the size of the tdb is smaller than the original tdb

> (the load is slower than if dumped in SPO order)
> 
> I tested the change here on that test file - I don't have a large corrupt database to try it on.
> 
>> Any ideas of how to get it fixed are more than welcome.
> 
> Personally, I would adopt a 2 stream approach.
> 
> Do approach above and also collect all the data together and start a fresh load of the database on another machine.

Doing it already.

Thanks,
Emilio

> 
> 	Good luck
> 	Andy
> 
>> 
>> Regards, Emilio
>> 
>> 
>> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>> 
>> 
>> 
> 

--
Emilio Migueláñez Martín
emilio.miguelanez@gmail.com

Re: TDB: records not strictly increasing

Posted by Emilio Miguelanez <em...@gmail.com>.

Andy,

Also, as a reference, I have also used the TDBVerifier* code developed by Paolo, to check the validity of the indexes. 

I got the following results

---- Scanning node table ... ----
Found 890517 RDF nodes.
Nodes.dat file size is: 114295521
114295393 + 124 + 4 = 114295521
---- Scanning GSPO ... ----
Found 0 records.
---- Scanning GPOS ... ----
Found 0 records.
---- Scanning GOSP ... ----
Found 0 records.
---- Scanning POSG ... ----
Found 0 records.
---- Scanning OSPG ... ----
Found 0 records.
---- Scanning SPOG ... ----
Found 0 records.
---- Scanning SPO ... ----
00000000012b1b96000000000000005a00000000000001bf00000000012b1b96
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
	at org.openjena.atlas.lib.ColumnMap.fetchSlotIdx(ColumnMap.java:137)
	at com.hp.hpl.jena.tdb.lib.TupleLib.tuple(TupleLib.java:213)
	at FunctionalTests.TDBVerifier.verifyIndex(TDBVerifier.java:111)
	at FunctionalTests.TDBVerifier.main(TDBVerifier.java:57)
Java Result: 1


Corroborating the corruption in the SPO index

Regards,
Emilio


* https://github.com/castagna/tdbloader4/blob/f5363fa49d16a04a362898c1a5084ade620ee81b/src/test/java/dev/TDBVerifier.java

On 28 Jan 2013, at 11:23, Andy Seaborne wrote:

> On 28/01/13 10:21, Emilio Miguelanez wrote:
>> 
>> On 27 Jan 2013, at 22:04, Andy Seaborne wrote:
>> 
>>> If select * { ?agent
>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
>>> 
>>> 
> }
>>> 
>>> works, it may be your lucky day.  The SPO index is intact so
>>> tdbdump will work.  Maybe.
>>> 
>>> If you have the original data, then rebuilding is much safer.
>>> There may be other problems not yet encountered.
>> 
>> 
>> This query works .... what should I do now?
>> 
>> If I run
>> 
>> tdbdump --loc=tdb > tdb.dump       (question: tdbdump are tdbbackup
>> are same commands?)
> 
> Almost.
> 
>> I get same error.
> 
> Not your lucky day I'm afraid.  The SPO index is damaged.  It does however look as if another index is intact.
> 
>> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
>> RecordRangeIterator: records not strictly increasing:
>> 0000000006d00261000000000000021c0000000006cfff69 //
>> 0000000006b861a3000000000005233d00000000015a78b5
>> 
>> I would like to try if the current tdb can be fixed, as rebuilding
>> could take long time. The database was created  with minimal data,
>> and it is being populated (dynamically) with data over a long period
>> of time (> 1 year)
> 
> SPO is the index used for iteration of the whole database.  This can be changed.
> 
> Is this a database of just triples? No named graphs?  So far, the corruption looks to be in SPO (an index on the default graph).
> 
> It will take some programming to fix this.  No guarantees that it will work but I've experimented here.
> 
> Take a backup of the database.
> 
> A/ (the second way is better)
> If you know all the possible properties, then write code that loops on each of the properties and does
> 
>   defaultGraph.find(null, property, null)
> 
> This will use the POS index.
> 
> Print everything in N-Triples.
> 
> B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena.
> 
> These instructions apply to code in SVN as it is now, today.  Not the last release, not last week.  It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature.
> 
> 1/ svn co the Jena code from trunk.
> 
> 2/ Build Jena
>   mvn clean install
> 
> It is easier to build and install than just package.
> 
> You must use the development releases of the other modules.
> I don't think you need to set up maven to use the snapshot builds on Apache but if you do:
> 
> Set <repository>
> http://jena.apache.org/download/maven.html
> 
> 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code.
> 
> 4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.
> 
> For added ease - use the Fuseki server jar which as everything in it
> 
> java -cp fuseki-server.jar tdb.tdbdump --version
> 
> Check timestamps/version numbers.
> 
> 5/ Test create a small text file of a few triples.
> 
> --- D.ttl
> @prefix : <http://example/> .
> 
> :s1 :p 1 .
> :s2 :p 2 .
> :s3 :q 3 .
> :s2 :q 4 .
> :s1 :q 5 .
> 
> ---
> 
> tdbdump --data D.ttl should dump the file with triples clustered by subject.
> 
> (no - you do not need to load a database - --data is a recent feature for testing)
> 
> 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method "chooseScanAllIndex"
> 
> Change:
> -----
>        if ( tupleLen != 4 )
>            return indexes[0] ;
> ==>
>        if ( tupleLen != 4 )
>        {
>            if ( indexes.length == 3 )
>                return indexes[1] ;
>            else
>                return indexes[0] ;
>        }
> -----
> 
> 7/ Rebuild.
> 
> Yes - the tests for TDB should pass!
> 
> 8/ check the new version
> 
> tdbdump --version
> 
> check the change
> 
> tdbdump --data D.ttl
> 
> and it should be n-triples clustered by property, different to earlier on.
> 
> 9/ Dump your database.
> 
> Hope there is a good index.
> 
> You can also try indexes[2] not indexes[1] to use the OSP index.
> Each dumps the entire database, but in different triple orders.
> 
> 10/ Clean up maven to get rid of the temporary build.
> 
> rm -r REPO/org/apache/jena/
> 
> 11/ Rebuild the database with tdbloader/tdbloader2.
> 
> (the load is slower than if dumped in SPO order)
> 
> I tested the change here on that test file - I don't have a large corrupt database to try it on.
> 
>> Any ideas of how to get it fixed are more than welcome.
> 
> Personally, I would adopt a 2 stream approach.
> 
> Do approach above and also collect all the data together and start a fresh load of the database on another machine.
> 
> 	Good luck
> 	Andy
> 
>> 
>> Regards, Emilio
>> 
>> 
>> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>> 
>> 
>> 
> 

--
Emilio Migueláñez Martín
emilio.miguelanez@gmail.com

Re: TDB: records not strictly increasing

Posted by Andy Seaborne <an...@apache.org>.

On 28/01/13 10:21, Emilio Miguelanez wrote:
>
> On 27 Jan 2013, at 22:04, Andy Seaborne wrote:
>
>> If select * { ?agent
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
>>
>>
}
>>
>> works, it may be your lucky day.  The SPO index is intact so
>> tdbdump will work.  Maybe.
>>
>> If you have the original data, then rebuilding is much safer.
>> There may be other problems not yet encountered.
>
>
> This query works .... what should I do now?
>
> If I run
>
> tdbdump --loc=tdb > tdb.dump       (question: tdbdump are tdbbackup
> are same commands?)

Almost.

> I get same error.

Not your lucky day I'm afraid.  The SPO index is damaged.  It does 
however look as if another index is intact.

> Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException:
> RecordRangeIterator: records not strictly increasing:
> 0000000006d00261000000000000021c0000000006cfff69 //
> 0000000006b861a3000000000005233d00000000015a78b5
>
> I would like to try if the current tdb can be fixed, as rebuilding
> could take long time. The database was created  with minimal data,
> and it is being populated (dynamically) with data over a long period
> of time (> 1 year)

SPO is the index used for iteration of the whole database.  This can be 
changed.

Is this a database of just triples? No named graphs?  So far, the 
corruption looks to be in SPO (an index on the default graph).

It will take some programming to fix this.  No guarantees that it will 
work but I've experimented here.

Take a backup of the database.

A/ (the second way is better)
If you know all the possible properties, then write code that loops on 
each of the properties and does

    defaultGraph.find(null, property, null)

This will use the POS index.

Print everything in N-Triples.

B/ A different, better approach is to build a special version of TDB. 
The changes needed are small but you need to build Jena.

These instructions apply to code in SVN as it is now, today.  Not the 
last release, not last week.  It's just easier to setup and explain from 
the current code base as a small recent change centralised the point you 
need to change and also introduced an easy to use testing feature.

1/ svn co the Jena code from trunk.

2/ Build Jena
    mvn clean install

It is easier to build and install than just package.

You must use the development releases of the other modules.
I don't think you need to set up maven to use the snapshot builds on 
Apache but if you do:

Set <repository>
http://jena.apache.org/download/maven.html

3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit 
the code.

4/ Setup to use this build for tdbdump.  e.g. the apache-jena or fuseki.

For added ease - use the Fuseki server jar which as everything in it

java -cp fuseki-server.jar tdb.tdbdump --version

Check timestamps/version numbers.

5/ Test create a small text file of a few triples.

--- D.ttl
@prefix : <http://example/> .

:s1 :p 1 .
:s2 :p 2 .
:s3 :q 3 .
:s2 :q 4 .
:s1 :q 5 .

---

tdbdump --data D.ttl should dump the file with triples clustered by subject.

(no - you do not need to load a database - --data is a recent feature 
for testing)

6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method 
"chooseScanAllIndex"

Change:
-----
         if ( tupleLen != 4 )
             return indexes[0] ;
==>
         if ( tupleLen != 4 )
         {
             if ( indexes.length == 3 )
                 return indexes[1] ;
             else
                 return indexes[0] ;
         }
-----

7/ Rebuild.

Yes - the tests for TDB should pass!

8/ check the new version

tdbdump --version

check the change

tdbdump --data D.ttl

and it should be n-triples clustered by property, different to earlier on.

9/ Dump your database.

Hope there is a good index.

You can also try indexes[2] not indexes[1] to use the OSP index.
Each dumps the entire database, but in different triple orders.

10/ Clean up maven to get rid of the temporary build.

rm -r REPO/org/apache/jena/

11/ Rebuild the database with tdbloader/tdbloader2.

(the load is slower than if dumped in SPO order)

I tested the change here on that test file - I don't have a large 
corrupt database to try it on.

> Any ideas of how to get it fixed are more than welcome.

Personally, I would adopt a 2 stream approach.

Do approach above and also collect all the data together and start a 
fresh load of the database on another machine.

	Good luck
	Andy

>
> Regards, Emilio
>
>
> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>
>
>

Re: TDB: records not strictly increasing

Posted by Emilio Miguelanez <em...@gmail.com>.

On 27 Jan 2013, at 22:04, Andy Seaborne wrote:

> If select * {
>  ?agent
>  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
> }
> 
> works, it may be your lucky day.  The SPO index is intact so tdbdump will work.  Maybe.
> 
> If you have the original data, then rebuilding is much safer.  There may be other problems not yet encountered.


This query works .... what should I do now?

If I run

tdbdump --loc=tdb > tdb.dump       (question: tdbdump are tdbbackup are same commands?)

I get same error.

Exception in thread "main" com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 0000000006d00261000000000000021c0000000006cfff69 // 0000000006b861a3000000000005233d00000000015a78b5

I would like to try if the current tdb can be fixed, as rebuilding could take long time. The database was created  with minimal data, and it is being populated (dynamically) with data over a long period of time (> 1 year)

Any ideas of how to get it fixed are more than welcome.

Regards,
Emilio


--
Emilio Migueláñez Martín
emilio.miguelanez@gmail.com

Re: TDB: records not strictly increasing

Posted by Andy Seaborne <an...@apache.org>.

On 27/01/13 12:40, Emilio Miguelanez wrote:
> Thanks Andy.
>
> On 27 Jan 2013, at 11:30, Andy Seaborne wrote:
>
>> On 26/01/13 23:29, Emilio Miguelanez wrote:
>>> Hi,
>>>
>>> I have build an application with a TDB storage, using version
>>> 0.8.10,which has now a considerable size (>1Gb) after it has been
>>> running for some time.
>>>
>>> However, recently I have facing some errors, detailed as
>>>
>>> com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator:
>>> records not strictly increasing:
>>> 0000000006d00261000000000000021c0000000006cfff69 //
>>> 0000000006b861a3000000000005233d00000000015a78b5
>>>
>>> I don't know what has caused this error, but I suspect that the
>>> storage (and data) is now corrupted somehow.
>>
>> Correct.
>>
>> And it happened at some time in the past - this is the point of
>> detecting the situation, not the cause (e.g. abrupt shutdown).
>
> Yes, I suspect that it was sudden shutdown of the server, not
> allowing a safe shutdown of the server.
>
>>> After reading the mailing lists, this error has already been
>>> reported (https://issues.apache.org/jira/browse/JENA-301) and
>>> fixed in the latest version of the TDB (0.9.4), which I'll start
>>> using right away.
>>>
>>> However, I really need to fix my existing TDB storage. Any ideas
>>> how I can fix it?
>>
>> This is tricky.  You need to find a way to force it to use a
>> different index to get all the data out so you can rebuild the
>> database.  It would be better if you have the original data or a
>> backup.
>
> Would it help if I generate the database with the original data? Then
> could I merge the current bulk of the data into the new database
> (using the indexing)?
>
>> What is the query provoking this?
>
> select ?agent ?job where{?agent
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
> . OPTIONAL{ ?agent
> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#hasJob>
> ?job }}

If select * {
   ?agent
   <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
 
<file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent>
}

works, it may be your lucky day.  The SPO index is intact so tdbdump 
will work.  Maybe.

If you have the original data, then rebuilding is much safer.  There may 
be other problems not yet encountered.

	Andy
>
> Cheers, Emilio
>
>>
>> Andy
>>
>>>
>>> Regards, Emilio
>>>
>>> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>>>
>>>
>>>
>>
>
> -- Emilio Migueláñez Martín emilio.miguelanez@gmail.com
>
>

Re: TDB: records not strictly increasing

Posted by Emilio Miguelanez <em...@gmail.com>.

Thanks Andy.

On 27 Jan 2013, at 11:30, Andy Seaborne wrote:

> On 26/01/13 23:29, Emilio Miguelanez wrote:
>> Hi,
>> 
>> I have build an application with a TDB storage, using version
>> 0.8.10,which has now a considerable size (>1Gb) after it has
> > been running for some time.
>> 
>> However, recently I have facing some errors, detailed as
>> 
>> com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 0000000006d00261000000000000021c0000000006cfff69 // 0000000006b861a3000000000005233d00000000015a78b5
>> 
>> I don't know what has caused this error, but I suspect that the storage (and data) is now corrupted somehow.
> 
> Correct.
> 
> And it happened at some time in the past - this is the point of detecting the situation, not the cause (e.g. abrupt shutdown).

Yes, I suspect that it was sudden shutdown of the server, not allowing a safe shutdown of the server.

>> After reading the mailing lists, this error has already been reported (https://issues.apache.org/jira/browse/JENA-301) and fixed in the latest version of the TDB (0.9.4), which I'll start using right away.
>> 
>> However, I really need to fix my existing TDB storage. Any ideas how I can fix it?
> 
> This is tricky.  You need to find a way to force it to use a different index to get all the data out so you can rebuild the database.  It would be better if you have the original data or a backup.

Would it help if I generate the database with the original data? Then could I merge the current bulk of the data into the new database (using the indexing)?

> What is the query provoking this?

select ?agent ?job where{?agent <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#Agent> . OPTIONAL{ ?agent <file:///etc/recovery/models/flow/.node-server/model-turbine-e4.json/seed/core.owl#hasJob> ?job }}

Cheers,
Emilio

> 
> 	Andy
> 
>> 
>> Regards,
>> Emilio
>> 
>> --
>> Emilio Migueláñez Martín
>> emilio.miguelanez@gmail.com
>> 
>> 
>> 
> 

--
Emilio Migueláñez Martín
emilio.miguelanez@gmail.com

Re: TDB: records not strictly increasing

Posted by Andy Seaborne <an...@apache.org>.

On 26/01/13 23:29, Emilio Miguelanez wrote:
> Hi,
>
> I have build an application with a TDB storage, using version
> 0.8.10,which has now a considerable size (>1Gb) after it has
 > been running for some time.
>
> However, recently I have facing some errors, detailed as
>
> com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 0000000006d00261000000000000021c0000000006cfff69 // 0000000006b861a3000000000005233d00000000015a78b5
>
> I don't know what has caused this error, but I suspect that the storage (and data) is now corrupted somehow.

Correct.

And it happened at some time in the past - this is the point of 
detecting the situation, not the cause (e.g. abrupt shutdown).

> After reading the mailing lists, this error has already been reported (https://issues.apache.org/jira/browse/JENA-301) and fixed in the latest version of the TDB (0.9.4), which I'll start using right away.
>
> However, I really need to fix my existing TDB storage. Any ideas how I can fix it?

This is tricky.  You need to find a way to force it to use a different 
index to get all the data out so you can rebuild the database.  It would 
be better if you have the original data or a backup.

What is the query provoking this?

	Andy

>
> Regards,
> Emilio
>
> --
> Emilio Migueláñez Martín
> emilio.miguelanez@gmail.com
>
>
>