You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Brian McBride <br...@epimorphics.com> on 2016/01/18 15:09:42 UTC
Re: fuseki: 2 services sharing a dataset with text index
On 22/12/15 18:22, Andy Seaborne wrote:
> JENA-1104 suggests there is a ordering/timing issue and that it is not
> Fuseki1/Fuseki2 expect that things happen in a different order.
I have investigated this further and I think I understand what is
happening.
If we have a configuration with the same dataset+text-index shared
between two services, then when the first service is built,
TextIndexLuceneAssembler is called to create TextIndexLucene object.
When the second service is built, TextIndexLuceneAssembler is called
again and creates another TextIndexLucene object.
Both of these TextIndexLucene objects create a Lucene IndexWriter object
on the same directory. That doesn't work because they both try to grab
the same lock and one fails.
I am happy to offer pull request to change this behaviour. There are
broadly two strategies that I can see, and I'm wondering if there is a
preferred approach from the Jena team.
The first approach is to make a change the way the assemblers work to
only create one TextIndexLucene object per node in the configuration graph.
A second approach is to modify the TextIndexLucene so that two or more
objects can operate on the same directory.
My default approach would be to make the change in the assembler code.
Brian
>
> I'm not sure that a shared index across two different datasets will
> work if updates are involved. Maybe someone else can help with that.
The configuration I'm looking at is not an index shared across two data
sets - there is one index+tdb-dataset pair in the configuration.
>
> What's fuseki:allowTimeoutOverride? Is this a local build with the
> code for that uncommented out?
>
> Andy
>
> On 21/12/15 14:53, Brian McBride wrote:
>> The fuseki configuration below sets up two services with a shared
>> dataset. The dataset has a lucene text index.
>>
>> This configuration works on Fuseki 1.3.1. Fuseki 2.3.1 fails to start.
>> The log output is shown below. Looks like the lucene index may be
>> trying to grab a lock for the dataset twice.
>>
>> If I change the second fuseki:dataset line to:
>>
>> [[
>> fuseki:dataset <#ds> ;
>> ]]
>>
>> then it works on Fuseki 2.3.1 and Unexpectedly both services have
>> access to the text index, which doesn't seem right, thought suits me for
>> the moment as I need both services to have access to the index.
>>
>> Is there some configuration change I need to make between Fuseki 1 and
>> Fuseki 2?
>>
>> Brian
>>
>>
>>
>> Fuseki 2.3.1 log output
>>
>> [[
>> 2015-12-21 14:42:20.940 WARN Config :: Fuseki v2:
>> Management functions are always on the same port as the server.
>> --mgtPort ignored.
>> 2015-12-21 14:42:21.062 INFO Server :: Fuseki 2.3.1
>> 2015-12-08T09:24:07+0000
>> 2015-12-21 14:42:21.229 INFO Config ::
>> FUSEKI_HOME=/usr/share/fuseki
>> 2015-12-21 14:42:21.230 INFO Config ::
>> FUSEKI_BASE=/etc/fuseki
>> 2015-12-21 14:42:21.233 INFO Servlet :: Initializing Shiro
>> environment
>> 2015-12-21 14:42:21.233 INFO EnvironmentLoader :: Starting Shiro
>> environment initialization.
>> 2015-12-21 14:42:21.242 INFO Config :: Shiro file:
>> file:///etc/fuseki/shiro.ini
>> 2015-12-21 14:42:21.415 INFO EnvironmentLoader :: Shiro environment
>> initialized in 181 ms.
>> 2015-12-21 14:42:21.415 INFO Config :: Configuration
>> file: /etc/fuseki/config.ttl
>> 2015-12-21 14:42:22.193 WARN AssemblerHelp :: ja:loadClass:
>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>> org.apache.jena.tdb.TDB
>> 2015-12-21 14:42:23.557 ERROR Server :: Exception in
>> initialization: caught:
>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>> 2015-12-21 14:42:23.577 INFO Server :: Started 2015/12/21
>> 14:42:23 UTC on port 3030
>>
>> ]]
>>
>>
>>
>> Fuseki configuration.
>>
>> [[
>>
>> # Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
>>
>> @prefix : <#> .
>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>
>> [] rdf:type fuseki:Server ;
>>
>> fuseki:services (
>> <#service_ds>
>> <#service_ds_timeout_override>
>> ) .
>>
>> # TDB
>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
>> tdb:GraphTDB rdfs:subClassOf ja:Model .
>>
>>
>>
>> <#service_ds> rdf:type fuseki:Service ;
>> rdfs:label "TDB Service (RW)" ;
>> fuseki:name "ds" ;
>> fuseki:serviceQuery "query" ;
>> fuseki:dataset <#ds-with-lucene> ;
>> .
>>
>> <#service_ds_timeout_override>
>> rdfs:label "TDB Service Query with
>> timeout override" ;
>> fuseki:name "ds_to" ;
>> fuseki:allowTimeoutOverride true;
>> fuseki:serviceQuery "query" ;
>> fuseki:dataset <#ds-with-lucene> ;
>> .
>>
>> <#ds> rdf:type tdb:DatasetTDB ;
>> tdb:location "/var/lib/fuseki/databases/ds" ;
>> .
>>
>>
>> @prefix text: <http://jena.apache.org/text#> .
>>
>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>
>>
>> <#ds-with-lucene>
>> rdf:type text:TextDataset;
>> text:dataset <#ds> ;
>> text:index <#indexLucene> ;
>> .
>>
>> <#indexLucene> a text:TextIndexLucene ;
>> text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>> text:entityMap <#entMap> ;
>> .
>>
>> <#entMap> a text:EntityMap ;
>> text:entityField "uri" ;
>> text:defaultField "text" ;
>> text:map (
>> [
>> text:field "text" ;
>> text:predicate rdfs:label ;
>> ]
>> ) .
>> ]]
>>
>
--
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)
Re: fuseki: 2 services sharing a dataset with text index
Posted by Brian McBride <br...@epimorphics.com>.
On 18/01/16 17:21, Andy Seaborne wrote:
> rdfs:seeAlso JENA-1104
>
> Can I suggest a 3rd option?
>
> A static cache in TextDatasetFactory remembers text datasets created
> and returns the same one on each call for the same location/text
> index. c.f. string interns.
That looks elegant to me.
[...]
> (Fuseki can have multiple separate configurations). The entry key can
> include the Lucene Directory - not sure what else is needed.
It gets a little complicated.
The Lucene Directory object can't be used in the key because there can
be multiple Lucene Directory objects pointing to the same file system
directory and there is no equals method on the Directory object. So if
a Lucene directory object is used in the key there could be still be
locking problems.
The key cannot consist of more than something that identifies the
underlying directory because that might cause locking problems as well.
So it won't be possible to have different indexes with say different
entity definitions or default analyzer on the same underlying index
directory. It would be nice to detect if that user has done that - but
that could be a bit tricky.
More, the current code supports the notion of an in memory index though
I think if you have an in memory index shared between two datasets the
current code will create two indexes where one would expect one.
One problem is that the current interface to TextDatasetFactory makes
use of the Lucene directory class which isn't the right abstraction for
this because it does not offer an equals method that "does the right
thing" for this use.
I think it can be made to work though it may involve some ugly "if
instanceof FSDirectory" like code. An alternative would be change the
interface to TextDatasetFactory.
Shall I create a new Jira for this (as I'm not sure this is the same
issue as JENA-1104) and see what the code looks like when I try it?
Brian
>
> There is an issue which can't be solved which is two differently
> configured text indexes over one Directory (I have never found a way
> to get the full configuration back out of a lucene index).
>
> The second option might work for some cases - one case (not here) is
> two different datasets trying to share one text index. Update will
> break - the same text index is inside two transaction regimes.
>
> Andy
>
>
> On 18/01/16 14:45, Rob Vesse wrote:
>> I would prefer the assembler option as that is only fixing the cause of
>> the specific bug and it in my mind fixes the assembler API semantics to
>> what I mentally expect
>>
>> Rob
>>
>> On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>>
>>>
>>>
>>> On 22/12/15 18:22, Andy Seaborne wrote:
>>>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>>>> Fuseki1/Fuseki2 expect that things happen in a different order.
>>> I have investigated this further and I think I understand what is
>>> happening.
>>>
>>> If we have a configuration with the same dataset+text-index shared
>>> between two services, then when the first service is built,
>>> TextIndexLuceneAssembler is called to create TextIndexLucene object.
>>> When the second service is built, TextIndexLuceneAssembler is called
>>> again and creates another TextIndexLucene object.
>>>
>>> Both of these TextIndexLucene objects create a Lucene IndexWriter
>>> object
>>> on the same directory. That doesn't work because they both try to grab
>>> the same lock and one fails.
>>>
>>> I am happy to offer pull request to change this behaviour. There are
>>> broadly two strategies that I can see, and I'm wondering if there is a
>>> preferred approach from the Jena team.
>>>
>>> The first approach is to make a change the way the assemblers work to
>>> only create one TextIndexLucene object per node in the configuration
>>> graph.
>>>
>>> A second approach is to modify the TextIndexLucene so that two or more
>>> objects can operate on the same directory.
>>>
>>> My default approach would be to make the change in the assembler code.
>>>
>>> Brian
>>>>
>>>> I'm not sure that a shared index across two different datasets will
>>>> work if updates are involved. Maybe someone else can help with that.
>>> The configuration I'm looking at is not an index shared across two data
>>> sets - there is one index+tdb-dataset pair in the configuration.
>>>>
>>>> What's fuseki:allowTimeoutOverride? Is this a local build with the
>>>> code for that uncommented out?
>>>>
>>>> Andy
>>>>
>>>> On 21/12/15 14:53, Brian McBride wrote:
>>>>> The fuseki configuration below sets up two services with a shared
>>>>> dataset. The dataset has a lucene text index.
>>>>>
>>>>> This configuration works on Fuseki 1.3.1. Fuseki 2.3.1 fails to
>>>>> start.
>>>>> The log output is shown below. Looks like the lucene index may be
>>>>> trying to grab a lock for the dataset twice.
>>>>>
>>>>> If I change the second fuseki:dataset line to:
>>>>>
>>>>> [[
>>>>> fuseki:dataset <#ds> ;
>>>>> ]]
>>>>>
>>>>> then it works on Fuseki 2.3.1 and Unexpectedly both services have
>>>>> access to the text index, which doesn't seem right, thought suits me
>>>>> for
>>>>> the moment as I need both services to have access to the index.
>>>>>
>>>>> Is there some configuration change I need to make between Fuseki 1
>>>>> and
>>>>> Fuseki 2?
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>>
>>>>> Fuseki 2.3.1 log output
>>>>>
>>>>> [[
>>>>> 2015-12-21 14:42:20.940 WARN Config :: Fuseki v2:
>>>>> Management functions are always on the same port as the server.
>>>>> --mgtPort ignored.
>>>>> 2015-12-21 14:42:21.062 INFO Server :: Fuseki 2.3.1
>>>>> 2015-12-08T09:24:07+0000
>>>>> 2015-12-21 14:42:21.229 INFO Config ::
>>>>> FUSEKI_HOME=/usr/share/fuseki
>>>>> 2015-12-21 14:42:21.230 INFO Config ::
>>>>> FUSEKI_BASE=/etc/fuseki
>>>>> 2015-12-21 14:42:21.233 INFO Servlet :: Initializing
>>>>> Shiro
>>>>> environment
>>>>> 2015-12-21 14:42:21.233 INFO EnvironmentLoader :: Starting Shiro
>>>>> environment initialization.
>>>>> 2015-12-21 14:42:21.242 INFO Config :: Shiro file:
>>>>> file:///etc/fuseki/shiro.ini
>>>>> 2015-12-21 14:42:21.415 INFO EnvironmentLoader :: Shiro
>>>>> environment
>>>>> initialized in 181 ms.
>>>>> 2015-12-21 14:42:21.415 INFO Config :: Configuration
>>>>> file: /etc/fuseki/config.ttl
>>>>> 2015-12-21 14:42:22.193 WARN AssemblerHelp :: ja:loadClass:
>>>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>>>> org.apache.jena.tdb.TDB
>>>>> 2015-12-21 14:42:23.557 ERROR Server :: Exception in
>>>>> initialization: caught:
>>>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>>>> 2015-12-21 14:42:23.577 INFO Server :: Started
>>>>> 2015/12/21
>>>>> 14:42:23 UTC on port 3030
>>>>>
>>>>> ]]
>>>>>
>>>>>
>>>>>
>>>>> Fuseki configuration.
>>>>>
>>>>> [[
>>>>>
>>>>> # Licensed under the terms of
>>>>> http://www.apache.org/licenses/LICENSE-2.0
>>>>>
>>>>> @prefix : <#> .
>>>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>>
>>>>> [] rdf:type fuseki:Server ;
>>>>>
>>>>> fuseki:services (
>>>>> <#service_ds>
>>>>> <#service_ds_timeout_override>
>>>>> ) .
>>>>>
>>>>> # TDB
>>>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
>>>>> tdb:GraphTDB rdfs:subClassOf ja:Model .
>>>>>
>>>>>
>>>>>
>>>>> <#service_ds> rdf:type fuseki:Service ;
>>>>> rdfs:label "TDB Service (RW)" ;
>>>>> fuseki:name "ds" ;
>>>>> fuseki:serviceQuery "query" ;
>>>>> fuseki:dataset <#ds-with-lucene> ;
>>>>> .
>>>>>
>>>>> <#service_ds_timeout_override>
>>>>> rdfs:label "TDB Service Query with
>>>>> timeout override" ;
>>>>> fuseki:name "ds_to" ;
>>>>> fuseki:allowTimeoutOverride true;
>>>>> fuseki:serviceQuery "query" ;
>>>>> fuseki:dataset <#ds-with-lucene> ;
>>>>> .
>>>>>
>>>>> <#ds> rdf:type tdb:DatasetTDB ;
>>>>> tdb:location "/var/lib/fuseki/databases/ds" ;
>>>>> .
>>>>>
>>>>>
>>>>> @prefix text: <http://jena.apache.org/text#> .
>>>>>
>>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>>>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>>>>
>>>>>
>>>>> <#ds-with-lucene>
>>>>> rdf:type text:TextDataset;
>>>>> text:dataset <#ds> ;
>>>>> text:index <#indexLucene> ;
>>>>> .
>>>>>
>>>>> <#indexLucene> a text:TextIndexLucene ;
>>>>> text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>>> text:entityMap <#entMap> ;
>>>>> .
>>>>>
>>>>> <#entMap> a text:EntityMap ;
>>>>> text:entityField "uri" ;
>>>>> text:defaultField "text" ;
>>>>> text:map (
>>>>> [
>>>>> text:field "text" ;
>>>>> text:predicate rdfs:label ;
>>>>> ]
>>>>> ) .
>>>>> ]]
>>>>>
>>>>
>>>
>>> --
>>> Epimorphics Ltd, http://www.epimorphics.com
>>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>>> BS20 6PT
>>> Epimorphics Ltd. is a limited company registered in England (number
>>> 7016688)
>>>
>>
>>
>>
>>
>
--
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)
Re: fuseki: 2 services sharing a dataset with text index
Posted by Andy Seaborne <an...@apache.org>.
rdfs:seeAlso JENA-1104
Can I suggest a 3rd option?
A static cache in TextDatasetFactory remembers text datasets created and
returns the same one on each call for the same location/text index. c.f.
string interns.
TDB does this in StoreConnection where there must be one DatasetGraphTDB
per storage location or else chaos results.
A single intern table the same effect as assembler caching but also
applies to java code as well and also across multiple assembler files
(Fuseki can have multiple separate configurations). The entry key can
include the Lucene Directory - not sure what else is needed.
There is an issue which can't be solved which is two differently
configured text indexes over one Directory (I have never found a way to
get the full configuration back out of a lucene index).
The second option might work for some cases - one case (not here) is two
different datasets trying to share one text index. Update will break -
the same text index is inside two transaction regimes.
Andy
On 18/01/16 14:45, Rob Vesse wrote:
> I would prefer the assembler option as that is only fixing the cause of
> the specific bug and it in my mind fixes the assembler API semantics to
> what I mentally expect
>
> Rob
>
> On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>
>>
>>
>> On 22/12/15 18:22, Andy Seaborne wrote:
>>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>>> Fuseki1/Fuseki2 expect that things happen in a different order.
>> I have investigated this further and I think I understand what is
>> happening.
>>
>> If we have a configuration with the same dataset+text-index shared
>> between two services, then when the first service is built,
>> TextIndexLuceneAssembler is called to create TextIndexLucene object.
>> When the second service is built, TextIndexLuceneAssembler is called
>> again and creates another TextIndexLucene object.
>>
>> Both of these TextIndexLucene objects create a Lucene IndexWriter object
>> on the same directory. That doesn't work because they both try to grab
>> the same lock and one fails.
>>
>> I am happy to offer pull request to change this behaviour. There are
>> broadly two strategies that I can see, and I'm wondering if there is a
>> preferred approach from the Jena team.
>>
>> The first approach is to make a change the way the assemblers work to
>> only create one TextIndexLucene object per node in the configuration
>> graph.
>>
>> A second approach is to modify the TextIndexLucene so that two or more
>> objects can operate on the same directory.
>>
>> My default approach would be to make the change in the assembler code.
>>
>> Brian
>>>
>>> I'm not sure that a shared index across two different datasets will
>>> work if updates are involved. Maybe someone else can help with that.
>> The configuration I'm looking at is not an index shared across two data
>> sets - there is one index+tdb-dataset pair in the configuration.
>>>
>>> What's fuseki:allowTimeoutOverride? Is this a local build with the
>>> code for that uncommented out?
>>>
>>> Andy
>>>
>>> On 21/12/15 14:53, Brian McBride wrote:
>>>> The fuseki configuration below sets up two services with a shared
>>>> dataset. The dataset has a lucene text index.
>>>>
>>>> This configuration works on Fuseki 1.3.1. Fuseki 2.3.1 fails to start.
>>>> The log output is shown below. Looks like the lucene index may be
>>>> trying to grab a lock for the dataset twice.
>>>>
>>>> If I change the second fuseki:dataset line to:
>>>>
>>>> [[
>>>> fuseki:dataset <#ds> ;
>>>> ]]
>>>>
>>>> then it works on Fuseki 2.3.1 and Unexpectedly both services have
>>>> access to the text index, which doesn't seem right, thought suits me
>>>> for
>>>> the moment as I need both services to have access to the index.
>>>>
>>>> Is there some configuration change I need to make between Fuseki 1 and
>>>> Fuseki 2?
>>>>
>>>> Brian
>>>>
>>>>
>>>>
>>>> Fuseki 2.3.1 log output
>>>>
>>>> [[
>>>> 2015-12-21 14:42:20.940 WARN Config :: Fuseki v2:
>>>> Management functions are always on the same port as the server.
>>>> --mgtPort ignored.
>>>> 2015-12-21 14:42:21.062 INFO Server :: Fuseki 2.3.1
>>>> 2015-12-08T09:24:07+0000
>>>> 2015-12-21 14:42:21.229 INFO Config ::
>>>> FUSEKI_HOME=/usr/share/fuseki
>>>> 2015-12-21 14:42:21.230 INFO Config ::
>>>> FUSEKI_BASE=/etc/fuseki
>>>> 2015-12-21 14:42:21.233 INFO Servlet :: Initializing
>>>> Shiro
>>>> environment
>>>> 2015-12-21 14:42:21.233 INFO EnvironmentLoader :: Starting Shiro
>>>> environment initialization.
>>>> 2015-12-21 14:42:21.242 INFO Config :: Shiro file:
>>>> file:///etc/fuseki/shiro.ini
>>>> 2015-12-21 14:42:21.415 INFO EnvironmentLoader :: Shiro environment
>>>> initialized in 181 ms.
>>>> 2015-12-21 14:42:21.415 INFO Config :: Configuration
>>>> file: /etc/fuseki/config.ttl
>>>> 2015-12-21 14:42:22.193 WARN AssemblerHelp :: ja:loadClass:
>>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>>> org.apache.jena.tdb.TDB
>>>> 2015-12-21 14:42:23.557 ERROR Server :: Exception in
>>>> initialization: caught:
>>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>>> 2015-12-21 14:42:23.577 INFO Server :: Started
>>>> 2015/12/21
>>>> 14:42:23 UTC on port 3030
>>>>
>>>> ]]
>>>>
>>>>
>>>>
>>>> Fuseki configuration.
>>>>
>>>> [[
>>>>
>>>> # Licensed under the terms of
>>>> http://www.apache.org/licenses/LICENSE-2.0
>>>>
>>>> @prefix : <#> .
>>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>
>>>> [] rdf:type fuseki:Server ;
>>>>
>>>> fuseki:services (
>>>> <#service_ds>
>>>> <#service_ds_timeout_override>
>>>> ) .
>>>>
>>>> # TDB
>>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
>>>> tdb:GraphTDB rdfs:subClassOf ja:Model .
>>>>
>>>>
>>>>
>>>> <#service_ds> rdf:type fuseki:Service ;
>>>> rdfs:label "TDB Service (RW)" ;
>>>> fuseki:name "ds" ;
>>>> fuseki:serviceQuery "query" ;
>>>> fuseki:dataset <#ds-with-lucene> ;
>>>> .
>>>>
>>>> <#service_ds_timeout_override>
>>>> rdfs:label "TDB Service Query with
>>>> timeout override" ;
>>>> fuseki:name "ds_to" ;
>>>> fuseki:allowTimeoutOverride true;
>>>> fuseki:serviceQuery "query" ;
>>>> fuseki:dataset <#ds-with-lucene> ;
>>>> .
>>>>
>>>> <#ds> rdf:type tdb:DatasetTDB ;
>>>> tdb:location "/var/lib/fuseki/databases/ds" ;
>>>> .
>>>>
>>>>
>>>> @prefix text: <http://jena.apache.org/text#> .
>>>>
>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>>>
>>>>
>>>> <#ds-with-lucene>
>>>> rdf:type text:TextDataset;
>>>> text:dataset <#ds> ;
>>>> text:index <#indexLucene> ;
>>>> .
>>>>
>>>> <#indexLucene> a text:TextIndexLucene ;
>>>> text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>> text:entityMap <#entMap> ;
>>>> .
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>> text:entityField "uri" ;
>>>> text:defaultField "text" ;
>>>> text:map (
>>>> [
>>>> text:field "text" ;
>>>> text:predicate rdfs:label ;
>>>> ]
>>>> ) .
>>>> ]]
>>>>
>>>
>>
>> --
>> Epimorphics Ltd, http://www.epimorphics.com
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>> BS20 6PT
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>>
>
>
>
>
Re: fuseki: 2 services sharing a dataset with text index
Posted by Andy Seaborne <an...@apache.org>.
On 18/01/16 14:45, Rob Vesse wrote:
> I would prefer the assembler option as that is only fixing the cause of
> the specific bug and it in my mind fixes the assembler API semantics to
> what I mentally expect
>
> Rob
After the discussions on PR#123 and JENA-1122, I think that only an
assembler based fix will work.
Fuseki handles datasets which are not fully transaction in a special way.
It devolves transaction handling to the wrapped dataset if possible.
That was put in specifically for text and spatial but it is an "best
effort" mechanism. (It backs off to locking if that does not work.)
Fixing the API calls is going to fragile at best, and will introduce
concurrency problems for e.g. in-mem or SDB + text indexes, and also
where two TDB datasets share an index.
Andy
>
> On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>
>>
>>
>> On 22/12/15 18:22, Andy Seaborne wrote:
>>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>>> Fuseki1/Fuseki2 expect that things happen in a different order.
>> I have investigated this further and I think I understand what is
>> happening.
>>
>> If we have a configuration with the same dataset+text-index shared
>> between two services, then when the first service is built,
>> TextIndexLuceneAssembler is called to create TextIndexLucene object.
>> When the second service is built, TextIndexLuceneAssembler is called
>> again and creates another TextIndexLucene object.
>>
>> Both of these TextIndexLucene objects create a Lucene IndexWriter object
>> on the same directory. That doesn't work because they both try to grab
>> the same lock and one fails.
>>
>> I am happy to offer pull request to change this behaviour. There are
>> broadly two strategies that I can see, and I'm wondering if there is a
>> preferred approach from the Jena team.
>>
>> The first approach is to make a change the way the assemblers work to
>> only create one TextIndexLucene object per node in the configuration
>> graph.
>>
>> A second approach is to modify the TextIndexLucene so that two or more
>> objects can operate on the same directory.
>>
>> My default approach would be to make the change in the assembler code.
>>
>> Brian
>>>
>>> I'm not sure that a shared index across two different datasets will
>>> work if updates are involved. Maybe someone else can help with that.
>> The configuration I'm looking at is not an index shared across two data
>> sets - there is one index+tdb-dataset pair in the configuration.
>>>
>>> What's fuseki:allowTimeoutOverride? Is this a local build with the
>>> code for that uncommented out?
>>>
>>> Andy
>>>
>>> On 21/12/15 14:53, Brian McBride wrote:
>>>> The fuseki configuration below sets up two services with a shared
>>>> dataset. The dataset has a lucene text index.
>>>>
>>>> This configuration works on Fuseki 1.3.1. Fuseki 2.3.1 fails to start.
>>>> The log output is shown below. Looks like the lucene index may be
>>>> trying to grab a lock for the dataset twice.
>>>>
>>>> If I change the second fuseki:dataset line to:
>>>>
>>>> [[
>>>> fuseki:dataset <#ds> ;
>>>> ]]
>>>>
>>>> then it works on Fuseki 2.3.1 and Unexpectedly both services have
>>>> access to the text index, which doesn't seem right, thought suits me
>>>> for
>>>> the moment as I need both services to have access to the index.
>>>>
>>>> Is there some configuration change I need to make between Fuseki 1 and
>>>> Fuseki 2?
>>>>
>>>> Brian
>>>>
>>>>
>>>>
>>>> Fuseki 2.3.1 log output
>>>>
>>>> [[
>>>> 2015-12-21 14:42:20.940 WARN Config :: Fuseki v2:
>>>> Management functions are always on the same port as the server.
>>>> --mgtPort ignored.
>>>> 2015-12-21 14:42:21.062 INFO Server :: Fuseki 2.3.1
>>>> 2015-12-08T09:24:07+0000
>>>> 2015-12-21 14:42:21.229 INFO Config ::
>>>> FUSEKI_HOME=/usr/share/fuseki
>>>> 2015-12-21 14:42:21.230 INFO Config ::
>>>> FUSEKI_BASE=/etc/fuseki
>>>> 2015-12-21 14:42:21.233 INFO Servlet :: Initializing
>>>> Shiro
>>>> environment
>>>> 2015-12-21 14:42:21.233 INFO EnvironmentLoader :: Starting Shiro
>>>> environment initialization.
>>>> 2015-12-21 14:42:21.242 INFO Config :: Shiro file:
>>>> file:///etc/fuseki/shiro.ini
>>>> 2015-12-21 14:42:21.415 INFO EnvironmentLoader :: Shiro environment
>>>> initialized in 181 ms.
>>>> 2015-12-21 14:42:21.415 INFO Config :: Configuration
>>>> file: /etc/fuseki/config.ttl
>>>> 2015-12-21 14:42:22.193 WARN AssemblerHelp :: ja:loadClass:
>>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>>> org.apache.jena.tdb.TDB
>>>> 2015-12-21 14:42:23.557 ERROR Server :: Exception in
>>>> initialization: caught:
>>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>>> 2015-12-21 14:42:23.577 INFO Server :: Started
>>>> 2015/12/21
>>>> 14:42:23 UTC on port 3030
>>>>
>>>> ]]
>>>>
>>>>
>>>>
>>>> Fuseki configuration.
>>>>
>>>> [[
>>>>
>>>> # Licensed under the terms of
>>>> http://www.apache.org/licenses/LICENSE-2.0
>>>>
>>>> @prefix : <#> .
>>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>
>>>> [] rdf:type fuseki:Server ;
>>>>
>>>> fuseki:services (
>>>> <#service_ds>
>>>> <#service_ds_timeout_override>
>>>> ) .
>>>>
>>>> # TDB
>>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
>>>> tdb:GraphTDB rdfs:subClassOf ja:Model .
>>>>
>>>>
>>>>
>>>> <#service_ds> rdf:type fuseki:Service ;
>>>> rdfs:label "TDB Service (RW)" ;
>>>> fuseki:name "ds" ;
>>>> fuseki:serviceQuery "query" ;
>>>> fuseki:dataset <#ds-with-lucene> ;
>>>> .
>>>>
>>>> <#service_ds_timeout_override>
>>>> rdfs:label "TDB Service Query with
>>>> timeout override" ;
>>>> fuseki:name "ds_to" ;
>>>> fuseki:allowTimeoutOverride true;
>>>> fuseki:serviceQuery "query" ;
>>>> fuseki:dataset <#ds-with-lucene> ;
>>>> .
>>>>
>>>> <#ds> rdf:type tdb:DatasetTDB ;
>>>> tdb:location "/var/lib/fuseki/databases/ds" ;
>>>> .
>>>>
>>>>
>>>> @prefix text: <http://jena.apache.org/text#> .
>>>>
>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>>>
>>>>
>>>> <#ds-with-lucene>
>>>> rdf:type text:TextDataset;
>>>> text:dataset <#ds> ;
>>>> text:index <#indexLucene> ;
>>>> .
>>>>
>>>> <#indexLucene> a text:TextIndexLucene ;
>>>> text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>> text:entityMap <#entMap> ;
>>>> .
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>> text:entityField "uri" ;
>>>> text:defaultField "text" ;
>>>> text:map (
>>>> [
>>>> text:field "text" ;
>>>> text:predicate rdfs:label ;
>>>> ]
>>>> ) .
>>>> ]]
>>>>
>>>
>>
>> --
>> Epimorphics Ltd, http://www.epimorphics.com
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>> BS20 6PT
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>>
>
>
>
>
Re: fuseki: 2 services sharing a dataset with text index
Posted by Rob Vesse <rv...@dotnetrdf.org>.
I would prefer the assembler option as that is only fixing the cause of
the specific bug and it in my mind fixes the assembler API semantics to
what I mentally expect
Rob
On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>
>
>On 22/12/15 18:22, Andy Seaborne wrote:
>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>> Fuseki1/Fuseki2 expect that things happen in a different order.
>I have investigated this further and I think I understand what is
>happening.
>
>If we have a configuration with the same dataset+text-index shared
>between two services, then when the first service is built,
>TextIndexLuceneAssembler is called to create TextIndexLucene object.
>When the second service is built, TextIndexLuceneAssembler is called
>again and creates another TextIndexLucene object.
>
>Both of these TextIndexLucene objects create a Lucene IndexWriter object
>on the same directory. That doesn't work because they both try to grab
>the same lock and one fails.
>
>I am happy to offer pull request to change this behaviour. There are
>broadly two strategies that I can see, and I'm wondering if there is a
>preferred approach from the Jena team.
>
>The first approach is to make a change the way the assemblers work to
>only create one TextIndexLucene object per node in the configuration
>graph.
>
>A second approach is to modify the TextIndexLucene so that two or more
>objects can operate on the same directory.
>
>My default approach would be to make the change in the assembler code.
>
>Brian
>>
>> I'm not sure that a shared index across two different datasets will
>> work if updates are involved. Maybe someone else can help with that.
>The configuration I'm looking at is not an index shared across two data
>sets - there is one index+tdb-dataset pair in the configuration.
>>
>> What's fuseki:allowTimeoutOverride? Is this a local build with the
>> code for that uncommented out?
>>
>> Andy
>>
>> On 21/12/15 14:53, Brian McBride wrote:
>>> The fuseki configuration below sets up two services with a shared
>>> dataset. The dataset has a lucene text index.
>>>
>>> This configuration works on Fuseki 1.3.1. Fuseki 2.3.1 fails to start.
>>> The log output is shown below. Looks like the lucene index may be
>>> trying to grab a lock for the dataset twice.
>>>
>>> If I change the second fuseki:dataset line to:
>>>
>>> [[
>>> fuseki:dataset <#ds> ;
>>> ]]
>>>
>>> then it works on Fuseki 2.3.1 and Unexpectedly both services have
>>> access to the text index, which doesn't seem right, thought suits me
>>>for
>>> the moment as I need both services to have access to the index.
>>>
>>> Is there some configuration change I need to make between Fuseki 1 and
>>> Fuseki 2?
>>>
>>> Brian
>>>
>>>
>>>
>>> Fuseki 2.3.1 log output
>>>
>>> [[
>>> 2015-12-21 14:42:20.940 WARN Config :: Fuseki v2:
>>> Management functions are always on the same port as the server.
>>> --mgtPort ignored.
>>> 2015-12-21 14:42:21.062 INFO Server :: Fuseki 2.3.1
>>> 2015-12-08T09:24:07+0000
>>> 2015-12-21 14:42:21.229 INFO Config ::
>>> FUSEKI_HOME=/usr/share/fuseki
>>> 2015-12-21 14:42:21.230 INFO Config ::
>>> FUSEKI_BASE=/etc/fuseki
>>> 2015-12-21 14:42:21.233 INFO Servlet :: Initializing
>>>Shiro
>>> environment
>>> 2015-12-21 14:42:21.233 INFO EnvironmentLoader :: Starting Shiro
>>> environment initialization.
>>> 2015-12-21 14:42:21.242 INFO Config :: Shiro file:
>>> file:///etc/fuseki/shiro.ini
>>> 2015-12-21 14:42:21.415 INFO EnvironmentLoader :: Shiro environment
>>> initialized in 181 ms.
>>> 2015-12-21 14:42:21.415 INFO Config :: Configuration
>>> file: /etc/fuseki/config.ttl
>>> 2015-12-21 14:42:22.193 WARN AssemblerHelp :: ja:loadClass:
>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>> org.apache.jena.tdb.TDB
>>> 2015-12-21 14:42:23.557 ERROR Server :: Exception in
>>> initialization: caught:
>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>> 2015-12-21 14:42:23.577 INFO Server :: Started
>>>2015/12/21
>>> 14:42:23 UTC on port 3030
>>>
>>> ]]
>>>
>>>
>>>
>>> Fuseki configuration.
>>>
>>> [[
>>>
>>> # Licensed under the terms of
>>>http://www.apache.org/licenses/LICENSE-2.0
>>>
>>> @prefix : <#> .
>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>
>>> [] rdf:type fuseki:Server ;
>>>
>>> fuseki:services (
>>> <#service_ds>
>>> <#service_ds_timeout_override>
>>> ) .
>>>
>>> # TDB
>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
>>> tdb:GraphTDB rdfs:subClassOf ja:Model .
>>>
>>>
>>>
>>> <#service_ds> rdf:type fuseki:Service ;
>>> rdfs:label "TDB Service (RW)" ;
>>> fuseki:name "ds" ;
>>> fuseki:serviceQuery "query" ;
>>> fuseki:dataset <#ds-with-lucene> ;
>>> .
>>>
>>> <#service_ds_timeout_override>
>>> rdfs:label "TDB Service Query with
>>> timeout override" ;
>>> fuseki:name "ds_to" ;
>>> fuseki:allowTimeoutOverride true;
>>> fuseki:serviceQuery "query" ;
>>> fuseki:dataset <#ds-with-lucene> ;
>>> .
>>>
>>> <#ds> rdf:type tdb:DatasetTDB ;
>>> tdb:location "/var/lib/fuseki/databases/ds" ;
>>> .
>>>
>>>
>>> @prefix text: <http://jena.apache.org/text#> .
>>>
>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>>
>>>
>>> <#ds-with-lucene>
>>> rdf:type text:TextDataset;
>>> text:dataset <#ds> ;
>>> text:index <#indexLucene> ;
>>> .
>>>
>>> <#indexLucene> a text:TextIndexLucene ;
>>> text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>> text:entityMap <#entMap> ;
>>> .
>>>
>>> <#entMap> a text:EntityMap ;
>>> text:entityField "uri" ;
>>> text:defaultField "text" ;
>>> text:map (
>>> [
>>> text:field "text" ;
>>> text:predicate rdfs:label ;
>>> ]
>>> ) .
>>> ]]
>>>
>>
>
>--
>Epimorphics Ltd, http://www.epimorphics.com
>Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>BS20 6PT
>Epimorphics Ltd. is a limited company registered in England (number
>7016688)
>