You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Chris Dollin <ch...@epimorphics.com> on 2016/01/08 16:34:35 UTC
transactions and docProducers
Dear All
(Not sure if this is really an @dev or @users question)
When Fuseki handles a query (or update), is that query
(or update) handled by a single thread or might it
be handled by multiple threads over the lifetime of
the query (or update)?
I ask because
* we have a TextDocProducer implementation called
TextDocProducerBatch. It (hence) follows the
DatasetChanges interface, tracking adds and
removes and updating a Lucene index.
* The "Batch" part is because it accumulates
quads with the same subject and, when the subject
changes, makes a single Entity for the subject
rather than entities for each quad.
* The accumulating quads are held in a data structure
* It's possible that read queries are running in
parallel with updates. The read queries also
go through the TextDocProducerBatch. To prevent
the read query performing operations on the update
state [1] we're holding the state as a thread-local
variable.
* This is only sound if all the TextDocProducer(Batch)
operations for a given query (or update) are handled
by a single transaction. Which seems plausible but I
can't point to anything that actually says so.
* So: is it the case?
* An alternative I considered was, given that there can
be at most on concurrent write transaction, to only
do perform the batch-and-update-index operations when
inside a write transaction. However, starting from
a TextDocProducerBatch, which is initialised with just
a TextIndex[Lucene] and a DatasetGraph[Transaction],
there doesn't seem any way to find out what the current
transaction is; you can find out that you are (or are
not) *in* a transaction but not whether it's a read
or write [2].
* Have I missed something?
Chris
[1] An actual problem that happened
[2] Yes, we could have a divergent version of Jena with
patches to access the transaction, but then we end
up using SNAPSHOT versions of Jena and gnashing teeth.
Re: transactions and docProducers
Posted by Chris Dollin <ch...@epimorphics.com>.
Thanks to Andy for his reply.
Chris "over-eager on the DELETE button" Dollin
Re: transactions and docProducers
Posted by Andy Seaborne <an...@apache.org>.
On 08/01/16 15:34, Chris Dollin wrote:
> Dear All
>
> (Not sure if this is really an @dev or @users question)
>
> When Fuseki handles a query (or update), is that query
> (or update) handled by a single thread or might it
> be handled by multiple threads over the lifetime of
> the query (or update)?
Single thread per Fuseki request.
What you seem to be replying on is that the update changes are all
handled by a single thread per transaction, which is true, although for
any part that will touch the text index, query and update are both
single-threaded.
From experience, just remember to remove the thread local (as well as
nulling it out) each transaction otherwise there is memory growth. It's
not bad in Fuseki, threads come from a Jetty-managed pool; but the pool
does not seem to guarantee to only reuse a fixed number and that it
isn't deleting and creating new threads esp under load. That makes the
number of ThreadLocals grow.
Andy
[1]
You are using TDB for the triplestore.
> I ask because
>
> * we have a TextDocProducer implementation called
> TextDocProducerBatch. It (hence) follows the
> DatasetChanges interface, tracking adds and
> removes and updating a Lucene index.
>
> * The "Batch" part is because it accumulates
> quads with the same subject and, when the subject
> changes, makes a single Entity for the subject
> rather than entities for each quad.
>
> * The accumulating quads are held in a data structure
>
> * It's possible that read queries are running in
> parallel with updates. The read queries also
> go through the TextDocProducerBatch. To prevent
> the read query performing operations on the update
> state [1] we're holding the state as a thread-local
> variable.
>
> * This is only sound if all the TextDocProducer(Batch)
> operations for a given query (or update) are handled
> by a single transaction. Which seems plausible but I
> can't point to anything that actually says so.
>
> * So: is it the case?
>
> * An alternative I considered was, given that there can
> be at most on concurrent write transaction, to only
> do perform the batch-and-update-index operations when
> inside a write transaction. However, starting from
> a TextDocProducerBatch, which is initialised with just
> a TextIndex[Lucene] and a DatasetGraph[Transaction],
> there doesn't seem any way to find out what the current
> transaction is; you can find out that you are (or are
> not) *in* a transaction but not whether it's a read
> or write [2].
>
> * Have I missed something?
>
> Chris
>
> [1] An actual problem that happened
>
> [2] Yes, we could have a divergent version of Jena with
> patches to access the transaction, but then we end
> up using SNAPSHOT versions of Jena and gnashing teeth.
>