You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Simon Sperl <sp...@gmail.com> on 2013/04/19 12:55:19 UTC

Restoring a corrupted StoreConnection

Hi,

what I do:
open a TDBFactory.createDataset()
run a sparql query that runs a long time
then the execution gets interrupted by a Thread.Interrupt exception.

which makes this trace:

com.hp.hpl.jena.tdb.base.file.FileException: FileAccessDirect
    at
com.hp.hpl.jena.tdb.base.file.BlockAccessDirect.readByteBuffer(BlockAccessDirect.java:74)
    at
com.hp.hpl.jena.tdb.base.file.BlockAccessDirect.read(BlockAccessDirect.java:61)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrFileAccess.getBlock(BlockMgrFileAccess.java:81)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrFileAccess.getRead(BlockMgrFileAccess.java:70)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.getRead(BlockMgrWrapper.java:52)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrSync.getRead(BlockMgrSync.java:48)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrCache.getRead(BlockMgrCache.java:128)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrCache.getReadIterator(BlockMgrCache.java:138)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.getReadIterator(BlockMgrWrapper.java:58)
    at
com.hp.hpl.jena.tdb.base.recordbuffer.RecordBufferPageMgr.getReadIterator(RecordBufferPageMgr.java:53)
    at
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.<init>(RecordRangeIterator.java:82)
    at
com.hp.hpl.jena.tdb.base.recordbuffer.RecordRangeIterator.iterator(RecordRangeIterator.java:40)
    at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.iterator(BPlusTree.java:383)
    at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.iterator(BPlusTree.java:366)
    at
com.hp.hpl.jena.tdb.index.TupleIndexRecord.findWorker(TupleIndexRecord.java:164)
    at
com.hp.hpl.jena.tdb.index.TupleIndexRecord.findOrScan(TupleIndexRecord.java:84)
    at
com.hp.hpl.jena.tdb.index.TupleIndexRecord.performFind(TupleIndexRecord.java:78)
    at com.hp.hpl.jena.tdb.index.TupleIndexBase.find(TupleIndexBase.java:91)
    at com.hp.hpl.jena.tdb.index.TupleTable.find(TupleTable.java:197)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.find(NodeTupleTableConcrete.java:169)
    at
com.hp.hpl.jena.tdb.solver.StageMatchTuple.makeNextStage(StageMatchTuple.java:91)
    at
com.hp.hpl.jena.tdb.solver.StageMatchTuple.makeNextStage(StageMatchTuple.java:37)
    at
org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:49)
    at
com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:195)
    at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:295)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConcat.hasNextBinding(QueryIterConcat.java:83)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:81)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:60)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterGroup$1.initializeIterator(QueryIterGroup.java:85)
    at
org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init(IteratorDelayedInitialization.java:37)
    at
org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext(IteratorDelayedInitialization.java:47)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:60)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterDistinctReduced.hasNextBinding(QueryIterDistinctReduced.java:54)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:72)
    ...
Caused by: java.nio.channels.ClosedByInterruptException
    at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:627)
    at
com.hp.hpl.jena.tdb.base.file.BlockAccessDirect.readByteBuffer(BlockAccessDirect.java:70)
    ... 62 more



after doing this the StoreConnecion/Dataset ..is not well since the
FileChannel got closed.

so running a query on this Dataset now causes:



161137 [ProcessThread] ERROR com.hp.hpl.jena.tdb.solver.BindingTDB -
get1(?v)
org.apache.jena.atlas.AtlasException:
java.nio.channels.ClosedChannelException
    at org.apache.jena.atlas.io.IO.exception(IO.java:154)
    at
com.hp.hpl.jena.tdb.base.file.BufferChannelFile.read(BufferChannelFile.java:113)
    at
com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:337)
    at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:178)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:74)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:103)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:74)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)
    at com.hp.hpl.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:123)
    at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:123)
    at
com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:131)
    at com.hp.hpl.jena.sparql.expr.ExprVar.eval(ExprVar.java:61)
    at com.hp.hpl.jena.sparql.expr.ExprVar.eval(ExprVar.java:54)
    at
com.hp.hpl.jena.sparql.expr.aggregate.AccumulatorExpr.accumulate(AccumulatorExpr.java:43)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterGroup$1.initializeIterator(QueryIterGroup.java:112)
    at
org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init(IteratorDelayedInitialization.java:37)
    at
org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext(IteratorDelayedInitialization.java:47)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:60)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:59)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterDistinctReduced.hasNextBinding(QueryIterDistinctReduced.java:54)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
    at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
    at
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:72)
    ...



So I try to remove the instance from the StoreConnection cache via release,
which causes:


Caused by: com.hp.hpl.jena.tdb.base.file.FileException: FileBase.sync
    at com.hp.hpl.jena.tdb.base.file.FileBase.sync(FileBase.java:110)
    at
com.hp.hpl.jena.tdb.base.file.BlockAccessBase.force(BlockAccessBase.java:135)
    at
com.hp.hpl.jena.tdb.base.file.BlockAccessDirect._close(BlockAccessDirect.java:116)
    at
com.hp.hpl.jena.tdb.base.file.BlockAccessBase.close(BlockAccessBase.java:152)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrFileAccess.close(BlockMgrFileAccess.java:142)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrWrapper.close(BlockMgrWrapper.java:132)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrSync.close(BlockMgrSync.java:119)
    at
com.hp.hpl.jena.tdb.base.block.BlockMgrCache.close(BlockMgrCache.java:263)
    at
com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.close(BPlusTree.java:446)
    at
com.hp.hpl.jena.tdb.index.TupleIndexRecord.close(TupleIndexRecord.java:225)
    at com.hp.hpl.jena.tdb.index.TupleTable.close(TupleTable.java:206)
    at
com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.close(NodeTupleTableConcrete.java:244)
    at com.hp.hpl.jena.tdb.store.TableBase.close(TableBase.java:57)
    at
com.hp.hpl.jena.tdb.store.DatasetGraphTDB._close(DatasetGraphTDB.java:174)
    at
com.hp.hpl.jena.sparql.core.DatasetGraphCaching.close(DatasetGraphCaching.java:135)
    at com.hp.hpl.jena.tdb.StoreConnection.expel(StoreConnection.java:202)
    at com.hp.hpl.jena.tdb.StoreConnection.release(StoreConnection.java:187)
  ...
Caused by: java.nio.channels.ClosedChannelException
    at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
    at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:349)
    at com.hp.hpl.jena.tdb.base.file.FileBase.sync(FileBase.java:108)
    ... 42 more

So.. can I fix this on my end?

I use apache-jena-2.10.0 on Windows XP Professional SP 3 and jdk 1.6.0_022.

lG,
 Simon

Re: Restoring a corrupted StoreConnection

Posted by Simon Sperl <sp...@gmail.com>.
Starting the Query Execution in its own thread
 did the trick. Thanks a lot for your help & patience :)

lg,
 Simon


On Fri, Apr 26, 2013 at 12:10 PM, Andy Seaborne <an...@apache.org> wrote:

> On 25/04/13 18:48, Simon Sperl wrote:
>
>> Hi,
>>
>> @Andy
>> You running a 32 bit JVM? yes
>> You are trying to stop a long running query from another thread? yes
>>
>> My usecase is that I am trying to write a jena plugin for Rapidminer.
>> What that means is that I have gui components representing "sparql query",
>> "sparql service query", "open tdb", "constant model",.. and stick their
>> inputs/outputs together to form a rapidminer-process.
>> And these processes can be halted (from the gui), which Rapidminer does by
>> calling Thread.interrupt(), so in essence I don't have full control over
>> the execution/interruptions.
>>
>>
>> I can understand/empathize not supporting Thread.interrupt(), but once the
>> system is inconsistent can I recover somehow?
>>
>
> There's nothing wrong with Thread.interrupt (unlike Thread.stop) expect
> that Jena doesn't do it that way.
>
> It looks like the thread interrupt has caused a java.nio.channels.**ClosedByInterruptException
> exception so any I/O operation is open to an exception.  Sometimes the
> system is performing two or more I/O operations in a coordinated way and
> coping with an interrupt at any point would be tedious and hard to get
> right.
>
> Transactions don't help - the ideal of transactions is that the system is
> either working or not.  This is a half-way partial death where the internal
> state of TDB is a unknown.
>
> I daresay it could have been written to use Thread.interrupt but would be
> hard when any I/O operation can return incomplete.  At least the
> cancellation flag is only tested at convenient (but quite fine grained)
> points in query execution.
>
> The other point is that it relies on query execution being
> single-threaded/same-thread.  That is not guaranteed and indeed hasn't
> always been true (RDQL used to be two-threaded).  ARQ (and TDB) uses a lot
> of iterators - running them across threads is quite natural to do (caution
> on granularity and overheads costing more than any gains).
>
> Systems need to support multiple independent requests.  There are only so
> many real threads (although it's going up quite rapidly these days) so
> splitting the workload to make request fairer makes sense currently.
>
>
> What to do about it:
>
> If you need a design for rapidminer where it can use hread.interrupt, then
> maybe this will work:
>
> On the thread where rapidminer thinks the request is, have an
> ExecutorService to fork query execution and return the result set in a
> Future<ResultSet> (e.g. FutureTask<>).
>
> It mist be a copy of the ResultSet, not the ResultSet retuned by
> execSelect because that is tied to query execution.
>
> Wait on the Future to get the results.
>
> If the Future.get receives InterruptedException, call QueryExecution.abort
> to kill the query on the second thread.
>
>         Andy
>
>
>> -Simon
>>
>
>

Re: Restoring a corrupted StoreConnection

Posted by Andy Seaborne <an...@apache.org>.
On 25/04/13 18:48, Simon Sperl wrote:
> Hi,
>
> @Andy
> You running a 32 bit JVM? yes
> You are trying to stop a long running query from another thread? yes
>
> My usecase is that I am trying to write a jena plugin for Rapidminer.
> What that means is that I have gui components representing "sparql query",
> "sparql service query", "open tdb", "constant model",.. and stick their
> inputs/outputs together to form a rapidminer-process.
> And these processes can be halted (from the gui), which Rapidminer does by
> calling Thread.interrupt(), so in essence I don't have full control over
> the execution/interruptions.
>
>
> I can understand/empathize not supporting Thread.interrupt(), but once the
> system is inconsistent can I recover somehow?

There's nothing wrong with Thread.interrupt (unlike Thread.stop) expect 
that Jena doesn't do it that way.

It looks like the thread interrupt has caused a 
java.nio.channels.ClosedByInterruptException exception so any I/O 
operation is open to an exception.  Sometimes the system is performing 
two or more I/O operations in a coordinated way and coping with an 
interrupt at any point would be tedious and hard to get right.

Transactions don't help - the ideal of transactions is that the system 
is either working or not.  This is a half-way partial death where the 
internal state of TDB is a unknown.

I daresay it could have been written to use Thread.interrupt but would 
be hard when any I/O operation can return incomplete.  At least the 
cancellation flag is only tested at convenient (but quite fine grained) 
points in query execution.

The other point is that it relies on query execution being 
single-threaded/same-thread.  That is not guaranteed and indeed hasn't 
always been true (RDQL used to be two-threaded).  ARQ (and TDB) uses a 
lot of iterators - running them across threads is quite natural to do 
(caution on granularity and overheads costing more than any gains).

Systems need to support multiple independent requests.  There are only 
so many real threads (although it's going up quite rapidly these days) 
so splitting the workload to make request fairer makes sense currently.


What to do about it:

If you need a design for rapidminer where it can use hread.interrupt, 
then maybe this will work:

On the thread where rapidminer thinks the request is, have an 
ExecutorService to fork query execution and return the result set in a 
Future<ResultSet> (e.g. FutureTask<>).

It mist be a copy of the ResultSet, not the ResultSet retuned by 
execSelect because that is tied to query execution.

Wait on the Future to get the results.

If the Future.get receives InterruptedException, call 
QueryExecution.abort to kill the query on the second thread.

	Andy

>
> -Simon


Re: Restoring a corrupted StoreConnection

Posted by Simon Sperl <sp...@gmail.com>.
Hi,

@Andy
You running a 32 bit JVM? yes
You are trying to stop a long running query from another thread? yes

My usecase is that I am trying to write a jena plugin for Rapidminer.
What that means is that I have gui components representing "sparql query",
"sparql service query", "open tdb", "constant model",.. and stick their
inputs/outputs together to form a rapidminer-process.
And these processes can be halted (from the gui), which Rapidminer does by
calling Thread.interrupt(), so in essence I don't have full control over
the execution/interruptions.


I can understand/empathize not supporting Thread.interrupt(), but once the
system is inconsistent can I recover somehow?

-Simon


On Thu, Apr 25, 2013 at 1:13 PM, Andy Seaborne <an...@apache.org> wrote:

> Simon,
>
> You running a 32 bit JVM?
> You are trying to stop a long running query from another thread?
>
>
> On 19/04/13 11:55, Simon Sperl wrote:
>
>> Hi,
>>
>> what I do:
>> open a TDBFactory.createDataset()
>> run a sparql query that runs a long time
>> then the execution gets interrupted by a Thread.Interrupt exception.
>>
>
> Why is the thread being killed with Thread.Interrupt?  Killing threads
> this way is not supported - it is likely to put the internal state of the
> system into an inconsistent state.
>
> Jena ARQ supports clean to stop queries via QueryExecution.abort() and
> also via query timeouts.
>
> /** Stop in mid execution.
>  * This method can be called in parallel with other methods on the
>  *  QueryExecution object.
>  *  There is no guarantee that the concrete implementation actual
>  *  will stop or that it will do so immediately.
>  *  No operations on the query execution or any associated
>  *  result set are permitted after this call and may cause exceptions to
> be thrown.
>  */
>
> This works by calling down the stack of operations currently executing the
> query and allows them to exit at the first moment they are able to.
>  Interrupting OS system calls is not supported.
>
>         Andy
>
>

Re: Restoring a corrupted StoreConnection

Posted by Andy Seaborne <an...@apache.org>.
Simon,

You running a 32 bit JVM?
You are trying to stop a long running query from another thread?

On 19/04/13 11:55, Simon Sperl wrote:
> Hi,
>
> what I do:
> open a TDBFactory.createDataset()
> run a sparql query that runs a long time
> then the execution gets interrupted by a Thread.Interrupt exception.

Why is the thread being killed with Thread.Interrupt?  Killing threads 
this way is not supported - it is likely to put the internal state of 
the system into an inconsistent state.

Jena ARQ supports clean to stop queries via QueryExecution.abort() and 
also via query timeouts.

/** Stop in mid execution.
  * This method can be called in parallel with other methods on the
  *  QueryExecution object.
  *  There is no guarantee that the concrete implementation actual
  *  will stop or that it will do so immediately.
  *  No operations on the query execution or any associated
  *  result set are permitted after this call and may cause exceptions 
to be thrown.
  */

This works by calling down the stack of operations currently executing 
the query and allows them to exit at the first moment they are able to. 
  Interrupting OS system calls is not supported.

	Andy