You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Brian McBride <br...@epimorphics.com> on 2016/01/18 15:09:42 UTC

Re: fuseki: 2 services sharing a dataset with text index


On 22/12/15 18:22, Andy Seaborne wrote:
> JENA-1104 suggests there is a ordering/timing issue and that it is not 
> Fuseki1/Fuseki2 expect that things happen in a different order.
I have investigated this further and I think I understand what is 
happening.

If we have a configuration with the same dataset+text-index shared 
between two services, then when the first service is built, 
TextIndexLuceneAssembler is called to create  TextIndexLucene object.  
When the second service is built, TextIndexLuceneAssembler is called 
again and creates another TextIndexLucene object.

Both of these TextIndexLucene objects create a Lucene IndexWriter object 
on the same directory.  That doesn't work because they both try to grab 
the same lock and one fails.

I am happy to offer pull request to change this behaviour.  There are 
broadly two strategies that I can see, and I'm wondering if there is a 
preferred approach from the Jena team.

The first approach is to make a change the way the assemblers work to 
only create one TextIndexLucene object per node in the configuration graph.

A second approach is to modify the TextIndexLucene so that two or more 
objects can operate on the same directory.

My default approach would be to make the change in the assembler code.

Brian
>
> I'm not sure that a shared index across two different datasets will 
> work if updates are involved.  Maybe someone else can help with that.
The configuration I'm looking at is not an index shared across two data 
sets - there is one index+tdb-dataset pair in the configuration.
>
> What's fuseki:allowTimeoutOverride?  Is this a local build with the 
> code for that uncommented out?
>
>     Andy
>
> On 21/12/15 14:53, Brian McBride wrote:
>> The fuseki configuration below sets up two services with a shared
>> dataset.  The dataset has a lucene text index.
>>
>> This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to start.
>> The log output is shown below.  Looks like the lucene index may be
>> trying to grab a lock for the dataset twice.
>>
>> If I change the second fuseki:dataset line to:
>>
>> [[
>>      fuseki:dataset                        <#ds> ;
>> ]]
>>
>> then it works on Fuseki 2.3.1 and  Unexpectedly both services have
>> access to the text index, which doesn't seem right, thought suits me for
>> the moment as I need both services to have access to the index.
>>
>> Is there some configuration change I need to make between Fuseki 1 and
>> Fuseki 2?
>>
>> Brian
>>
>>
>>
>> Fuseki 2.3.1 log output
>>
>> [[
>> 2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
>> Management functions are always on the same port as the server.
>> --mgtPort ignored.
>> 2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
>> 2015-12-08T09:24:07+0000
>> 2015-12-21 14:42:21.229 INFO  Config               ::
>> FUSEKI_HOME=/usr/share/fuseki
>> 2015-12-21 14:42:21.230 INFO  Config               ::
>> FUSEKI_BASE=/etc/fuseki
>> 2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing Shiro
>> environment
>> 2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
>> environment initialization.
>> 2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
>> file:///etc/fuseki/shiro.ini
>> 2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro environment
>> initialized in 181 ms.
>> 2015-12-21 14:42:21.415 INFO  Config               :: Configuration
>> file: /etc/fuseki/config.ttl
>> 2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>> org.apache.jena.tdb.TDB
>> 2015-12-21 14:42:23.557 ERROR Server               :: Exception in
>> initialization: caught:
>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>> 2015-12-21 14:42:23.577 INFO  Server               :: Started 2015/12/21
>> 14:42:23 UTC on port 3030
>>
>> ]]
>>
>>
>>
>> Fuseki configuration.
>>
>> [[
>>
>> # Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
>>
>> @prefix :        <#> .
>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>
>> [] rdf:type fuseki:Server ;
>>
>>     fuseki:services (
>>       <#service_ds>
>>       <#service_ds_timeout_override>
>>     ) .
>>
>> # TDB
>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>
>>
>>
>> <#service_ds> rdf:type fuseki:Service ;
>>      rdfs:label                             "TDB Service (RW)" ;
>>      fuseki:name                            "ds" ;
>>      fuseki:serviceQuery                    "query" ;
>>      fuseki:dataset <#ds-with-lucene> ;
>>      .
>>
>> <#service_ds_timeout_override>
>>      rdfs:label                            "TDB Service Query with
>> timeout override" ;
>>      fuseki:name                           "ds_to" ;
>>      fuseki:allowTimeoutOverride           true;
>>      fuseki:serviceQuery                   "query" ;
>>      fuseki:dataset <#ds-with-lucene> ;
>>      .
>>
>> <#ds> rdf:type      tdb:DatasetTDB ;
>>                        tdb:location "/var/lib/fuseki/databases/ds" ;
>>       .
>>
>>
>> @prefix text:    <http://jena.apache.org/text#> .
>>
>> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>
>>
>> <#ds-with-lucene>
>>      rdf:type     text:TextDataset;
>>      text:dataset   <#ds> ;
>>      text:index     <#indexLucene> ;
>>      .
>>
>> <#indexLucene> a text:TextIndexLucene ;
>>      text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>      text:entityMap <#entMap> ;
>>      .
>>
>> <#entMap> a text:EntityMap ;
>>      text:entityField      "uri" ;
>>      text:defaultField     "text" ;
>>      text:map (
>>           [
>>             text:field "text" ;
>>             text:predicate rdfs:label ;
>>           ]
>>           ) .
>> ]]
>>
>

-- 
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: fuseki: 2 services sharing a dataset with text index

Posted by Brian McBride <br...@epimorphics.com>.

On 18/01/16 17:21, Andy Seaborne wrote:
> rdfs:seeAlso JENA-1104
>
> Can I suggest a 3rd option?
>
> A static cache in TextDatasetFactory remembers text datasets created 
> and returns the same one on each call for the same location/text 
> index. c.f. string interns.
That looks elegant to me.

[...]

> (Fuseki can have multiple separate configurations).  The entry key can 
> include the Lucene Directory - not sure what else is needed.
It gets a little complicated.

The Lucene Directory object can't be used in the key because there can 
be multiple Lucene Directory objects pointing to the same file system 
directory and there is no equals method on the Directory object.  So if 
a Lucene directory object is used in the key there could be still be 
locking problems.

The key cannot consist of more than something that identifies the 
underlying directory because that might cause locking problems as well.  
So it won't be possible to have different indexes with say different 
entity definitions or default analyzer on the same underlying index 
directory.  It would be nice to detect if that user has done that - but 
that could be a bit tricky.

More, the current code supports the notion of an in memory index though 
I think if you have an in memory index shared between two datasets the 
current code will create two indexes where one would expect one.

One problem is that the current interface to TextDatasetFactory makes 
use of the Lucene directory class which isn't the right abstraction for 
this because it does not offer an equals method that "does the right 
thing" for this use.

I think it can be made to work though it may involve some ugly "if 
instanceof FSDirectory" like code.  An alternative would be change the 
interface to TextDatasetFactory.

Shall I create a new Jira for this (as I'm not sure this is the same 
issue as JENA-1104) and see what the code looks like when I try it?

Brian






>
> There is an issue which can't be solved which is two differently 
> configured text indexes over one Directory (I have never found a way 
> to get the full configuration back out of a lucene index).
>
> The second option might work for some cases - one case (not here) is 
> two different datasets trying to share one text index.  Update will 
> break - the same text index is inside two transaction regimes.
>
>     Andy
>
>
> On 18/01/16 14:45, Rob Vesse wrote:
>> I would prefer the assembler option as that is only fixing the cause of
>> the specific bug and it in my mind fixes the assembler API semantics to
>> what I mentally expect
>>
>> Rob
>>
>> On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>>
>>>
>>>
>>> On 22/12/15 18:22, Andy Seaborne wrote:
>>>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>>>> Fuseki1/Fuseki2 expect that things happen in a different order.
>>> I have investigated this further and I think I understand what is
>>> happening.
>>>
>>> If we have a configuration with the same dataset+text-index shared
>>> between two services, then when the first service is built,
>>> TextIndexLuceneAssembler is called to create  TextIndexLucene object.
>>> When the second service is built, TextIndexLuceneAssembler is called
>>> again and creates another TextIndexLucene object.
>>>
>>> Both of these TextIndexLucene objects create a Lucene IndexWriter 
>>> object
>>> on the same directory.  That doesn't work because they both try to grab
>>> the same lock and one fails.
>>>
>>> I am happy to offer pull request to change this behaviour. There are
>>> broadly two strategies that I can see, and I'm wondering if there is a
>>> preferred approach from the Jena team.
>>>
>>> The first approach is to make a change the way the assemblers work to
>>> only create one TextIndexLucene object per node in the configuration
>>> graph.
>>>
>>> A second approach is to modify the TextIndexLucene so that two or more
>>> objects can operate on the same directory.
>>>
>>> My default approach would be to make the change in the assembler code.
>>>
>>> Brian
>>>>
>>>> I'm not sure that a shared index across two different datasets will
>>>> work if updates are involved.  Maybe someone else can help with that.
>>> The configuration I'm looking at is not an index shared across two data
>>> sets - there is one index+tdb-dataset pair in the configuration.
>>>>
>>>> What's fuseki:allowTimeoutOverride?  Is this a local build with the
>>>> code for that uncommented out?
>>>>
>>>>      Andy
>>>>
>>>> On 21/12/15 14:53, Brian McBride wrote:
>>>>> The fuseki configuration below sets up two services with a shared
>>>>> dataset.  The dataset has a lucene text index.
>>>>>
>>>>> This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to 
>>>>> start.
>>>>> The log output is shown below.  Looks like the lucene index may be
>>>>> trying to grab a lock for the dataset twice.
>>>>>
>>>>> If I change the second fuseki:dataset line to:
>>>>>
>>>>> [[
>>>>>       fuseki:dataset                        <#ds> ;
>>>>> ]]
>>>>>
>>>>> then it works on Fuseki 2.3.1 and  Unexpectedly both services have
>>>>> access to the text index, which doesn't seem right, thought suits me
>>>>> for
>>>>> the moment as I need both services to have access to the index.
>>>>>
>>>>> Is there some configuration change I need to make between Fuseki 1 
>>>>> and
>>>>> Fuseki 2?
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>>
>>>>> Fuseki 2.3.1 log output
>>>>>
>>>>> [[
>>>>> 2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
>>>>> Management functions are always on the same port as the server.
>>>>> --mgtPort ignored.
>>>>> 2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
>>>>> 2015-12-08T09:24:07+0000
>>>>> 2015-12-21 14:42:21.229 INFO  Config               ::
>>>>> FUSEKI_HOME=/usr/share/fuseki
>>>>> 2015-12-21 14:42:21.230 INFO  Config               ::
>>>>> FUSEKI_BASE=/etc/fuseki
>>>>> 2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing
>>>>> Shiro
>>>>> environment
>>>>> 2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
>>>>> environment initialization.
>>>>> 2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
>>>>> file:///etc/fuseki/shiro.ini
>>>>> 2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro 
>>>>> environment
>>>>> initialized in 181 ms.
>>>>> 2015-12-21 14:42:21.415 INFO  Config               :: Configuration
>>>>> file: /etc/fuseki/config.ttl
>>>>> 2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
>>>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>>>> org.apache.jena.tdb.TDB
>>>>> 2015-12-21 14:42:23.557 ERROR Server               :: Exception in
>>>>> initialization: caught:
>>>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>>>> 2015-12-21 14:42:23.577 INFO  Server               :: Started
>>>>> 2015/12/21
>>>>> 14:42:23 UTC on port 3030
>>>>>
>>>>> ]]
>>>>>
>>>>>
>>>>>
>>>>> Fuseki configuration.
>>>>>
>>>>> [[
>>>>>
>>>>> # Licensed under the terms of
>>>>> http://www.apache.org/licenses/LICENSE-2.0
>>>>>
>>>>> @prefix :        <#> .
>>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>>
>>>>> [] rdf:type fuseki:Server ;
>>>>>
>>>>>      fuseki:services (
>>>>>        <#service_ds>
>>>>>        <#service_ds_timeout_override>
>>>>>      ) .
>>>>>
>>>>> # TDB
>>>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>>>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>>>
>>>>>
>>>>>
>>>>> <#service_ds> rdf:type fuseki:Service ;
>>>>>       rdfs:label                             "TDB Service (RW)" ;
>>>>>       fuseki:name                            "ds" ;
>>>>>       fuseki:serviceQuery                    "query" ;
>>>>>       fuseki:dataset <#ds-with-lucene> ;
>>>>>       .
>>>>>
>>>>> <#service_ds_timeout_override>
>>>>>       rdfs:label                            "TDB Service Query with
>>>>> timeout override" ;
>>>>>       fuseki:name                           "ds_to" ;
>>>>>       fuseki:allowTimeoutOverride           true;
>>>>>       fuseki:serviceQuery                   "query" ;
>>>>>       fuseki:dataset <#ds-with-lucene> ;
>>>>>       .
>>>>>
>>>>> <#ds> rdf:type      tdb:DatasetTDB ;
>>>>>                         tdb:location "/var/lib/fuseki/databases/ds" ;
>>>>>        .
>>>>>
>>>>>
>>>>> @prefix text:    <http://jena.apache.org/text#> .
>>>>>
>>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>>
>>>>>
>>>>> <#ds-with-lucene>
>>>>>       rdf:type     text:TextDataset;
>>>>>       text:dataset   <#ds> ;
>>>>>       text:index     <#indexLucene> ;
>>>>>       .
>>>>>
>>>>> <#indexLucene> a text:TextIndexLucene ;
>>>>>       text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>>>       text:entityMap <#entMap> ;
>>>>>       .
>>>>>
>>>>> <#entMap> a text:EntityMap ;
>>>>>       text:entityField      "uri" ;
>>>>>       text:defaultField     "text" ;
>>>>>       text:map (
>>>>>            [
>>>>>              text:field "text" ;
>>>>>              text:predicate rdfs:label ;
>>>>>            ]
>>>>>            ) .
>>>>> ]]
>>>>>
>>>>
>>>
>>> -- 
>>> Epimorphics Ltd, http://www.epimorphics.com
>>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>>> BS20 6PT
>>> Epimorphics Ltd. is a limited company registered in England (number
>>> 7016688)
>>>
>>
>>
>>
>>
>

-- 
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: fuseki: 2 services sharing a dataset with text index

Posted by Andy Seaborne <an...@apache.org>.
rdfs:seeAlso JENA-1104

Can I suggest a 3rd option?

A static cache in TextDatasetFactory remembers text datasets created and 
returns the same one on each call for the same location/text index. c.f. 
string interns.

TDB does this in StoreConnection where there must be one DatasetGraphTDB 
per storage location or else chaos results.

A single intern table the same effect as assembler caching but also 
applies to java code as well and also across multiple assembler files
(Fuseki can have multiple separate configurations).  The entry key can 
include the Lucene Directory - not sure what else is needed.

There is an issue which can't be solved which is two differently 
configured text indexes over one Directory (I have never found a way to 
get the full configuration back out of a lucene index).

The second option might work for some cases - one case (not here) is two 
different datasets trying to share one text index.  Update will break - 
the same text index is inside two transaction regimes.

	Andy


On 18/01/16 14:45, Rob Vesse wrote:
> I would prefer the assembler option as that is only fixing the cause of
> the specific bug and it in my mind fixes the assembler API semantics to
> what I mentally expect
>
> Rob
>
> On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>
>>
>>
>> On 22/12/15 18:22, Andy Seaborne wrote:
>>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>>> Fuseki1/Fuseki2 expect that things happen in a different order.
>> I have investigated this further and I think I understand what is
>> happening.
>>
>> If we have a configuration with the same dataset+text-index shared
>> between two services, then when the first service is built,
>> TextIndexLuceneAssembler is called to create  TextIndexLucene object.
>> When the second service is built, TextIndexLuceneAssembler is called
>> again and creates another TextIndexLucene object.
>>
>> Both of these TextIndexLucene objects create a Lucene IndexWriter object
>> on the same directory.  That doesn't work because they both try to grab
>> the same lock and one fails.
>>
>> I am happy to offer pull request to change this behaviour.  There are
>> broadly two strategies that I can see, and I'm wondering if there is a
>> preferred approach from the Jena team.
>>
>> The first approach is to make a change the way the assemblers work to
>> only create one TextIndexLucene object per node in the configuration
>> graph.
>>
>> A second approach is to modify the TextIndexLucene so that two or more
>> objects can operate on the same directory.
>>
>> My default approach would be to make the change in the assembler code.
>>
>> Brian
>>>
>>> I'm not sure that a shared index across two different datasets will
>>> work if updates are involved.  Maybe someone else can help with that.
>> The configuration I'm looking at is not an index shared across two data
>> sets - there is one index+tdb-dataset pair in the configuration.
>>>
>>> What's fuseki:allowTimeoutOverride?  Is this a local build with the
>>> code for that uncommented out?
>>>
>>>      Andy
>>>
>>> On 21/12/15 14:53, Brian McBride wrote:
>>>> The fuseki configuration below sets up two services with a shared
>>>> dataset.  The dataset has a lucene text index.
>>>>
>>>> This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to start.
>>>> The log output is shown below.  Looks like the lucene index may be
>>>> trying to grab a lock for the dataset twice.
>>>>
>>>> If I change the second fuseki:dataset line to:
>>>>
>>>> [[
>>>>       fuseki:dataset                        <#ds> ;
>>>> ]]
>>>>
>>>> then it works on Fuseki 2.3.1 and  Unexpectedly both services have
>>>> access to the text index, which doesn't seem right, thought suits me
>>>> for
>>>> the moment as I need both services to have access to the index.
>>>>
>>>> Is there some configuration change I need to make between Fuseki 1 and
>>>> Fuseki 2?
>>>>
>>>> Brian
>>>>
>>>>
>>>>
>>>> Fuseki 2.3.1 log output
>>>>
>>>> [[
>>>> 2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
>>>> Management functions are always on the same port as the server.
>>>> --mgtPort ignored.
>>>> 2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
>>>> 2015-12-08T09:24:07+0000
>>>> 2015-12-21 14:42:21.229 INFO  Config               ::
>>>> FUSEKI_HOME=/usr/share/fuseki
>>>> 2015-12-21 14:42:21.230 INFO  Config               ::
>>>> FUSEKI_BASE=/etc/fuseki
>>>> 2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing
>>>> Shiro
>>>> environment
>>>> 2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
>>>> environment initialization.
>>>> 2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
>>>> file:///etc/fuseki/shiro.ini
>>>> 2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro environment
>>>> initialized in 181 ms.
>>>> 2015-12-21 14:42:21.415 INFO  Config               :: Configuration
>>>> file: /etc/fuseki/config.ttl
>>>> 2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
>>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>>> org.apache.jena.tdb.TDB
>>>> 2015-12-21 14:42:23.557 ERROR Server               :: Exception in
>>>> initialization: caught:
>>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>>> 2015-12-21 14:42:23.577 INFO  Server               :: Started
>>>> 2015/12/21
>>>> 14:42:23 UTC on port 3030
>>>>
>>>> ]]
>>>>
>>>>
>>>>
>>>> Fuseki configuration.
>>>>
>>>> [[
>>>>
>>>> # Licensed under the terms of
>>>> http://www.apache.org/licenses/LICENSE-2.0
>>>>
>>>> @prefix :        <#> .
>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>
>>>> [] rdf:type fuseki:Server ;
>>>>
>>>>      fuseki:services (
>>>>        <#service_ds>
>>>>        <#service_ds_timeout_override>
>>>>      ) .
>>>>
>>>> # TDB
>>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>>
>>>>
>>>>
>>>> <#service_ds> rdf:type fuseki:Service ;
>>>>       rdfs:label                             "TDB Service (RW)" ;
>>>>       fuseki:name                            "ds" ;
>>>>       fuseki:serviceQuery                    "query" ;
>>>>       fuseki:dataset <#ds-with-lucene> ;
>>>>       .
>>>>
>>>> <#service_ds_timeout_override>
>>>>       rdfs:label                            "TDB Service Query with
>>>> timeout override" ;
>>>>       fuseki:name                           "ds_to" ;
>>>>       fuseki:allowTimeoutOverride           true;
>>>>       fuseki:serviceQuery                   "query" ;
>>>>       fuseki:dataset <#ds-with-lucene> ;
>>>>       .
>>>>
>>>> <#ds> rdf:type      tdb:DatasetTDB ;
>>>>                         tdb:location "/var/lib/fuseki/databases/ds" ;
>>>>        .
>>>>
>>>>
>>>> @prefix text:    <http://jena.apache.org/text#> .
>>>>
>>>> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>
>>>>
>>>> <#ds-with-lucene>
>>>>       rdf:type     text:TextDataset;
>>>>       text:dataset   <#ds> ;
>>>>       text:index     <#indexLucene> ;
>>>>       .
>>>>
>>>> <#indexLucene> a text:TextIndexLucene ;
>>>>       text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>>       text:entityMap <#entMap> ;
>>>>       .
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>>       text:entityField      "uri" ;
>>>>       text:defaultField     "text" ;
>>>>       text:map (
>>>>            [
>>>>              text:field "text" ;
>>>>              text:predicate rdfs:label ;
>>>>            ]
>>>>            ) .
>>>> ]]
>>>>
>>>
>>
>> --
>> Epimorphics Ltd, http://www.epimorphics.com
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>> BS20 6PT
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>>
>
>
>
>


Re: fuseki: 2 services sharing a dataset with text index

Posted by Andy Seaborne <an...@apache.org>.
On 18/01/16 14:45, Rob Vesse wrote:
> I would prefer the assembler option as that is only fixing the cause of
> the specific bug and it in my mind fixes the assembler API semantics to
> what I mentally expect
>
> Rob

After the discussions on PR#123 and JENA-1122, I think that only an 
assembler based fix will work.

Fuseki handles datasets which are not fully transaction in a special way.

It devolves transaction handling to the wrapped dataset if possible. 
That was put in specifically for text and spatial but it is an "best 
effort" mechanism. (It backs off to locking if that does not work.)

Fixing the API calls is going to fragile at best, and will introduce 
concurrency problems for e.g. in-mem or SDB + text indexes, and also 
where two TDB datasets share an index.

	Andy


>
> On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:
>
>>
>>
>> On 22/12/15 18:22, Andy Seaborne wrote:
>>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>>> Fuseki1/Fuseki2 expect that things happen in a different order.
>> I have investigated this further and I think I understand what is
>> happening.
>>
>> If we have a configuration with the same dataset+text-index shared
>> between two services, then when the first service is built,
>> TextIndexLuceneAssembler is called to create  TextIndexLucene object.
>> When the second service is built, TextIndexLuceneAssembler is called
>> again and creates another TextIndexLucene object.
>>
>> Both of these TextIndexLucene objects create a Lucene IndexWriter object
>> on the same directory.  That doesn't work because they both try to grab
>> the same lock and one fails.
>>
>> I am happy to offer pull request to change this behaviour.  There are
>> broadly two strategies that I can see, and I'm wondering if there is a
>> preferred approach from the Jena team.
>>
>> The first approach is to make a change the way the assemblers work to
>> only create one TextIndexLucene object per node in the configuration
>> graph.
>>
>> A second approach is to modify the TextIndexLucene so that two or more
>> objects can operate on the same directory.
>>
>> My default approach would be to make the change in the assembler code.
>>
>> Brian
>>>
>>> I'm not sure that a shared index across two different datasets will
>>> work if updates are involved.  Maybe someone else can help with that.
>> The configuration I'm looking at is not an index shared across two data
>> sets - there is one index+tdb-dataset pair in the configuration.
>>>
>>> What's fuseki:allowTimeoutOverride?  Is this a local build with the
>>> code for that uncommented out?
>>>
>>>      Andy
>>>
>>> On 21/12/15 14:53, Brian McBride wrote:
>>>> The fuseki configuration below sets up two services with a shared
>>>> dataset.  The dataset has a lucene text index.
>>>>
>>>> This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to start.
>>>> The log output is shown below.  Looks like the lucene index may be
>>>> trying to grab a lock for the dataset twice.
>>>>
>>>> If I change the second fuseki:dataset line to:
>>>>
>>>> [[
>>>>       fuseki:dataset                        <#ds> ;
>>>> ]]
>>>>
>>>> then it works on Fuseki 2.3.1 and  Unexpectedly both services have
>>>> access to the text index, which doesn't seem right, thought suits me
>>>> for
>>>> the moment as I need both services to have access to the index.
>>>>
>>>> Is there some configuration change I need to make between Fuseki 1 and
>>>> Fuseki 2?
>>>>
>>>> Brian
>>>>
>>>>
>>>>
>>>> Fuseki 2.3.1 log output
>>>>
>>>> [[
>>>> 2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
>>>> Management functions are always on the same port as the server.
>>>> --mgtPort ignored.
>>>> 2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
>>>> 2015-12-08T09:24:07+0000
>>>> 2015-12-21 14:42:21.229 INFO  Config               ::
>>>> FUSEKI_HOME=/usr/share/fuseki
>>>> 2015-12-21 14:42:21.230 INFO  Config               ::
>>>> FUSEKI_BASE=/etc/fuseki
>>>> 2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing
>>>> Shiro
>>>> environment
>>>> 2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
>>>> environment initialization.
>>>> 2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
>>>> file:///etc/fuseki/shiro.ini
>>>> 2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro environment
>>>> initialized in 181 ms.
>>>> 2015-12-21 14:42:21.415 INFO  Config               :: Configuration
>>>> file: /etc/fuseki/config.ttl
>>>> 2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
>>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>>> org.apache.jena.tdb.TDB
>>>> 2015-12-21 14:42:23.557 ERROR Server               :: Exception in
>>>> initialization: caught:
>>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>>> 2015-12-21 14:42:23.577 INFO  Server               :: Started
>>>> 2015/12/21
>>>> 14:42:23 UTC on port 3030
>>>>
>>>> ]]
>>>>
>>>>
>>>>
>>>> Fuseki configuration.
>>>>
>>>> [[
>>>>
>>>> # Licensed under the terms of
>>>> http://www.apache.org/licenses/LICENSE-2.0
>>>>
>>>> @prefix :        <#> .
>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>
>>>> [] rdf:type fuseki:Server ;
>>>>
>>>>      fuseki:services (
>>>>        <#service_ds>
>>>>        <#service_ds_timeout_override>
>>>>      ) .
>>>>
>>>> # TDB
>>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>>
>>>>
>>>>
>>>> <#service_ds> rdf:type fuseki:Service ;
>>>>       rdfs:label                             "TDB Service (RW)" ;
>>>>       fuseki:name                            "ds" ;
>>>>       fuseki:serviceQuery                    "query" ;
>>>>       fuseki:dataset <#ds-with-lucene> ;
>>>>       .
>>>>
>>>> <#service_ds_timeout_override>
>>>>       rdfs:label                            "TDB Service Query with
>>>> timeout override" ;
>>>>       fuseki:name                           "ds_to" ;
>>>>       fuseki:allowTimeoutOverride           true;
>>>>       fuseki:serviceQuery                   "query" ;
>>>>       fuseki:dataset <#ds-with-lucene> ;
>>>>       .
>>>>
>>>> <#ds> rdf:type      tdb:DatasetTDB ;
>>>>                         tdb:location "/var/lib/fuseki/databases/ds" ;
>>>>        .
>>>>
>>>>
>>>> @prefix text:    <http://jena.apache.org/text#> .
>>>>
>>>> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>
>>>>
>>>> <#ds-with-lucene>
>>>>       rdf:type     text:TextDataset;
>>>>       text:dataset   <#ds> ;
>>>>       text:index     <#indexLucene> ;
>>>>       .
>>>>
>>>> <#indexLucene> a text:TextIndexLucene ;
>>>>       text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>>       text:entityMap <#entMap> ;
>>>>       .
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>>       text:entityField      "uri" ;
>>>>       text:defaultField     "text" ;
>>>>       text:map (
>>>>            [
>>>>              text:field "text" ;
>>>>              text:predicate rdfs:label ;
>>>>            ]
>>>>            ) .
>>>> ]]
>>>>
>>>
>>
>> --
>> Epimorphics Ltd, http://www.epimorphics.com
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>> BS20 6PT
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>>
>
>
>
>


Re: fuseki: 2 services sharing a dataset with text index

Posted by Rob Vesse <rv...@dotnetrdf.org>.
I would prefer the assembler option as that is only fixing the cause of
the specific bug and it in my mind fixes the assembler API semantics to
what I mentally expect

Rob

On 18/01/2016 14:09, "Brian McBride" <br...@epimorphics.com> wrote:

>
>
>On 22/12/15 18:22, Andy Seaborne wrote:
>> JENA-1104 suggests there is a ordering/timing issue and that it is not
>> Fuseki1/Fuseki2 expect that things happen in a different order.
>I have investigated this further and I think I understand what is
>happening.
>
>If we have a configuration with the same dataset+text-index shared
>between two services, then when the first service is built,
>TextIndexLuceneAssembler is called to create  TextIndexLucene object.
>When the second service is built, TextIndexLuceneAssembler is called
>again and creates another TextIndexLucene object.
>
>Both of these TextIndexLucene objects create a Lucene IndexWriter object
>on the same directory.  That doesn't work because they both try to grab
>the same lock and one fails.
>
>I am happy to offer pull request to change this behaviour.  There are
>broadly two strategies that I can see, and I'm wondering if there is a
>preferred approach from the Jena team.
>
>The first approach is to make a change the way the assemblers work to
>only create one TextIndexLucene object per node in the configuration
>graph.
>
>A second approach is to modify the TextIndexLucene so that two or more
>objects can operate on the same directory.
>
>My default approach would be to make the change in the assembler code.
>
>Brian
>>
>> I'm not sure that a shared index across two different datasets will
>> work if updates are involved.  Maybe someone else can help with that.
>The configuration I'm looking at is not an index shared across two data
>sets - there is one index+tdb-dataset pair in the configuration.
>>
>> What's fuseki:allowTimeoutOverride?  Is this a local build with the
>> code for that uncommented out?
>>
>>     Andy
>>
>> On 21/12/15 14:53, Brian McBride wrote:
>>> The fuseki configuration below sets up two services with a shared
>>> dataset.  The dataset has a lucene text index.
>>>
>>> This configuration works on Fuseki 1.3.1.  Fuseki 2.3.1 fails to start.
>>> The log output is shown below.  Looks like the lucene index may be
>>> trying to grab a lock for the dataset twice.
>>>
>>> If I change the second fuseki:dataset line to:
>>>
>>> [[
>>>      fuseki:dataset                        <#ds> ;
>>> ]]
>>>
>>> then it works on Fuseki 2.3.1 and  Unexpectedly both services have
>>> access to the text index, which doesn't seem right, thought suits me
>>>for
>>> the moment as I need both services to have access to the index.
>>>
>>> Is there some configuration change I need to make between Fuseki 1 and
>>> Fuseki 2?
>>>
>>> Brian
>>>
>>>
>>>
>>> Fuseki 2.3.1 log output
>>>
>>> [[
>>> 2015-12-21 14:42:20.940 WARN  Config               :: Fuseki v2:
>>> Management functions are always on the same port as the server.
>>> --mgtPort ignored.
>>> 2015-12-21 14:42:21.062 INFO  Server               :: Fuseki 2.3.1
>>> 2015-12-08T09:24:07+0000
>>> 2015-12-21 14:42:21.229 INFO  Config               ::
>>> FUSEKI_HOME=/usr/share/fuseki
>>> 2015-12-21 14:42:21.230 INFO  Config               ::
>>> FUSEKI_BASE=/etc/fuseki
>>> 2015-12-21 14:42:21.233 INFO  Servlet              :: Initializing
>>>Shiro
>>> environment
>>> 2015-12-21 14:42:21.233 INFO  EnvironmentLoader    :: Starting Shiro
>>> environment initialization.
>>> 2015-12-21 14:42:21.242 INFO  Config               :: Shiro file:
>>> file:///etc/fuseki/shiro.ini
>>> 2015-12-21 14:42:21.415 INFO  EnvironmentLoader    :: Shiro environment
>>> initialized in 181 ms.
>>> 2015-12-21 14:42:21.415 INFO  Config               :: Configuration
>>> file: /etc/fuseki/config.ttl
>>> 2015-12-21 14:42:22.193 WARN  AssemblerHelp        :: ja:loadClass:
>>> Migration to Jena3: Converting com.hp.hpl.jena.tdb.TDB to
>>> org.apache.jena.tdb.TDB
>>> 2015-12-21 14:42:23.557 ERROR Server               :: Exception in
>>> initialization: caught:
>>> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
>>> out: NativeFSLock@/var/lib/fuseki/databases/ds-lucene/write.lock
>>> 2015-12-21 14:42:23.577 INFO  Server               :: Started
>>>2015/12/21
>>> 14:42:23 UTC on port 3030
>>>
>>> ]]
>>>
>>>
>>>
>>> Fuseki configuration.
>>>
>>> [[
>>>
>>> # Licensed under the terms of
>>>http://www.apache.org/licenses/LICENSE-2.0
>>>
>>> @prefix :        <#> .
>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>
>>> [] rdf:type fuseki:Server ;
>>>
>>>     fuseki:services (
>>>       <#service_ds>
>>>       <#service_ds_timeout_override>
>>>     ) .
>>>
>>> # TDB
>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
>>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>>>
>>>
>>>
>>> <#service_ds> rdf:type fuseki:Service ;
>>>      rdfs:label                             "TDB Service (RW)" ;
>>>      fuseki:name                            "ds" ;
>>>      fuseki:serviceQuery                    "query" ;
>>>      fuseki:dataset <#ds-with-lucene> ;
>>>      .
>>>
>>> <#service_ds_timeout_override>
>>>      rdfs:label                            "TDB Service Query with
>>> timeout override" ;
>>>      fuseki:name                           "ds_to" ;
>>>      fuseki:allowTimeoutOverride           true;
>>>      fuseki:serviceQuery                   "query" ;
>>>      fuseki:dataset <#ds-with-lucene> ;
>>>      .
>>>
>>> <#ds> rdf:type      tdb:DatasetTDB ;
>>>                        tdb:location "/var/lib/fuseki/databases/ds" ;
>>>       .
>>>
>>>
>>> @prefix text:    <http://jena.apache.org/text#> .
>>>
>>> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>
>>>
>>> <#ds-with-lucene>
>>>      rdf:type     text:TextDataset;
>>>      text:dataset   <#ds> ;
>>>      text:index     <#indexLucene> ;
>>>      .
>>>
>>> <#indexLucene> a text:TextIndexLucene ;
>>>      text:directory <file:///var/lib/fuseki/databases/ds-lucene>;
>>>      text:entityMap <#entMap> ;
>>>      .
>>>
>>> <#entMap> a text:EntityMap ;
>>>      text:entityField      "uri" ;
>>>      text:defaultField     "text" ;
>>>      text:map (
>>>           [
>>>             text:field "text" ;
>>>             text:predicate rdfs:label ;
>>>           ]
>>>           ) .
>>> ]]
>>>
>>
>
>-- 
>Epimorphics Ltd, http://www.epimorphics.com
>Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>BS20 6PT
>Epimorphics Ltd. is a limited company registered in England (number
>7016688)
>