You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Yang Yuanzhe <ya...@proxml.be> on 2015/04/14 19:51:55 UTC

Unable to enable text search in Fuseki 2 for in-memory datasets

Hi there,

Sorry to trouble you again. Last month I wrote to you to figure out the 
bug in text search for TDB. Given the following configuration, text 
search works with TDB:

> @prefix :        <#> .
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:    <http://jena.apache.org/text#> .
>
> [] rdf:type fuseki:Server ;
>    fuseki:services (
>      <#service_text_tdb>
>    ) .
>
> # TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>
> # Text
> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>
> <#service_text_tdb> rdf:type fuseki:Service ;
>     rdfs:label                      "TDB/text service" ;
>     fuseki:name                     "ds" ;
>     fuseki:serviceQuery             "query" ;
>     fuseki:serviceQuery             "sparql" ;
>     fuseki:serviceUpdate            "update" ;
>     fuseki:serviceUpload            "upload" ;
>     fuseki:serviceReadGraphStore    "get" ;
>     fuseki:serviceReadWriteGraphStore    "data" ;
>     fuseki:dataset                  <#text_dataset> ;
>     .
>
> <#text_dataset> rdf:type     text:TextDataset ;
>     text:dataset   <#dataset> ;
>     text:index     <#indexLucene> ;
>     .
>
> <#dataset> rdf:type      tdb:DatasetTDB ;
>     tdb:location "DB" ;
>     ##tdb:unionDefaultGraph true ;
>     .
>
> <#indexLucene> a text:TextIndexLucene ;
>     text:directory <file:Lucene> ;
>     ##text:directory "mem" ;
>     text:entityMap <#entMap> ;
>     .
>
> <#entMap> a text:EntityMap ;
>     text:entityField      "uri" ;
>     text:defaultField     "text" ;        ## Should be defined in the 
> text:map.
>     text:map (
>          # rdfs:label
>          [ text:field "text" ; text:predicate rdfs:label ]
>          ) .

Now we want to use text search for in-memory datasets, but we failed 
after some trials, the configuration file we use is as follows:

> @prefix :        <#> .
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:    <http://jena.apache.org/text#> .
> @prefix spatial:    <http://jena.apache.org/spatial#> .
>
> [] a fuseki:Server ;
>    fuseki:services (
>      <#memory>
>    ) .
>
> <#memory> a fuseki:Service ;
>     fuseki:name                     "memory" ;
>     fuseki:serviceQuery             "sparql" ;
>     fuseki:serviceQuery             "query" ;
>     fuseki:serviceUpdate            "update" ;   # SPARQL query 
> service -- /memory/update
>     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload 
> service
>     fuseki:serviceReadWriteGraphStore      "data" ;
>     fuseki:serviceReadGraphStore       "get" ;   # Graph store 
> protocol (read only) -- /memory/get
>     fuseki:dataset           :text_dataset ;
>     .
>
> <#dataset> rdf:type ja:RDFDataset ;
>     ja:defaultGraph
>           [
>             a ja:MemoryModel ;
>             ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
>           ] .
>
> # Text
> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>
> :text_dataset a text:TextDataset ;
>     text:dataset   <#dataset> ;
>     text:index     <#textIndexLucene> ;
>     .
>
> # Text index description
> <#textIndexLucene> a text:TextIndexLucene ;
>     text:directory <file:Lucene> ;
>     ##text:directory "mem" ;
>     text:entityMap <#entMap> ;
>     .
>
> <#entMap> a text:EntityMap ;
>     text:entityField      "uri" ;
>     text:defaultField     "text" ;
>     text:map (
>          [ text:field "text" ; text:predicate rdfs:label ]
>          ) .
>

Additionally, if we enable RDFS reasoning for TDB, text search does not 
work any more, given the following configuration:

> @prefix :        <#> .
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:    <http://jena.apache.org/text#> .
> @prefix spatial:    <http://jena.apache.org/spatial#> .
> @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
>
> [] a fuseki:Server ;
>    fuseki:services (
>      <#tdb>
>    ) .
>
> # Custom code.
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>
> # TDB
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
>
> <#tdb> a fuseki:Service ;
>     fuseki:name              "tdb" ;             # http://host/tdb
>     fuseki:serviceQuery      "sparql" ;          # SPARQL query service
>     fuseki:serviceQuery      "query" ;
>     fuseki:serviceUpdate     "update" ;
>     fuseki:serviceUpload     "upload" ;          # Non-SPARQL upload 
> service
>     fuseki:serviceReadGraphStore    "get" ;
>     fuseki:serviceReadWriteGraphStore    "data" ;
>     fuseki:dataset           :text_dataset ;
>     .
>
> <#tdb_inf_ds> a ja:RDFDataset ;
>     ja:defaultGraph       <#tdb_inf> ;
>     .
>
> <#tdb_inf> a ja:InfModel ;
>     rdfs:label "RDFS Inference Model" ;
>     ja:baseModel <#tdb_graph> ;
>     ja:reasoner
>          [ ja:reasonerURL 
> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
>     .
>
> <#tdb_graph> a tdb:GraphTDB ;
>     tdb:dataset <#tdb_ds> .
>
> # A TDB datset used for RDF storage
> <#tdb_ds> a tdb:DatasetTDB;
>     tdb:location "Data";
>     .
>
> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>
> :text_dataset a text:TextDataset ;
>     text:dataset   <#tdb_inf_ds> ;
>     text:index     <#textIndexLucene> ;
>     .
>
> <#textIndexLucene> a text:TextIndexLucene ;
>     text:directory <file:Text> ;
>     ##text:directory "mem" ;
>     text:entityMap <#entMap> ;
>     .
>
> <#entMap> a text:EntityMap ;
>     text:entityField      "uri" ;
>     text:defaultField     "text" ;
>     text:map (
>          [ text:field "text" ; text:predicate rdfs:label ]
>          ) .

All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any 
clue or any suggestion for this issue? Thank you very much and have a 
nice day.

Regards,
Yang

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Yang Yuanzhe <ya...@proxml.be>.

Hi Andy,

Thank you for reminding me the mailing issue. I am very sorry for the 
inconvenience I am causing. I didn't find the reason why this happens. I 
tested it and the address for sending seems correct. Anyway, I have sent 
another subscription request with the other address according to your 
suggestion.

Thank you again and have a nice day.

Regards,
Yang

On 06/08/2015 04:46 PM, Andy Seaborne wrote:
> On 08/06/15 14:34, Yang wrote:
>> Hi Andy,
>>
>> Thank you very much for the suggestion. In-memory TDB dataset works
>> properly.
>>
>> As for the 500 error in loading, maybe you didn't notice my
>> explanation about it. It emerges on 2.0.0 only when an in-memory
>> dataset is used with text search enabled. I reported this error to
>> you in March and it is fixed later on in a snapshot. Now in the
>> latest snapshot loading is working, but Lucene does not index any
>> more.
>
> Something different is happening because the text indexing code was 
> made more integrated with transactions and a general purpose dataset 
> is not properly transactional - it's can combine graphs with different 
> storages.
>
>> Anyway, while using in-memory TDB for the moment, we are looking
>> forward to your solution (or even a new release) for it. Thank you in
>> advance for your efforts and have a nice day.
>
> JENA-956 has already been fixed.
> https://issues.apache.org/jira/browse/JENA-956
>
>>
>> Regards, Yang
>>
>> PS: I am working behind some firewalls so sometimes I can't send out
>> emails. :D
>
> So far, I've had to manually let through your emails.  Please could you
> register properly.
>
> You are registered as yang@proxml.be but sending from 
> yang@mail.proxml.be.
>
> To subscribe a specific address use "users-subscribe-ID=HOST@..."
>
> users-subscribe-yang=mail.proxml.be@jena.apache.org
>
>     Andy
>
>
>>
>>
>> On 06/05/2015 12:32 PM, Andy Seaborne wrote:
>>> I've logged this as JENA-956 (with details).  The work-round is to
>>> use an in-memory TDB dataset.
>>>
>>> tdb:location "--mem--" ;
>>>
>>>
>>>
>>>> [2015-06-03 12:10:47] HttpAction WARN Exception during abort
>>>> (operation attempts to continue): Can't abort a write
>>>> lock-transaction [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server
>>>> Error (523 ms)
>>>
>>> You loaded the data twice I guess.
>>>
>>> Andy
>>>
>>>
>>> PS Your email address yang@mail.proxml.be
>>> <ma...@mail.proxml.be> does not always work.
>>>
>>> Reporting-MTA: dns; mailrelay118.isp.belgacom.be
>>>
>>> Final-Recipient: rfc822;yang@mail.proxml.be
>>> <ma...@mail.proxml.be> Action: failed Status: 5.0.0
>>> (permanent failure) Remote-MTA: dns; [91.183.52.144]
>>> Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0
>>> Error: too many hops' (delivery attempts: 0)
>>>
>>>
>>>
>>> On 05/06/15 09:17, Yang wrote:
>>>> Hi Andy,
>>>>
>>>> I am sorry for such a late response. We were busy on another
>>>> project during this period. Now I try to explain how I reproduce
>>>> the error step by step. I did send you an email to the mailing
>>>> list yesterday, however it never shows up. So I would like to
>>>> give another trial today. My apologies for possible duplicates.
>>>>
>>>> So the problem is there is something wrong in the search indexing
>>>> for in-memory datasets. Here is the configuration file I used, it
>>>> should be basic enough: a server description, a service
>>>> description and an index engine associating to the dataset to
>>>> index "rdfs:label".
>>>>
>>>>> @prefix : <#> . @prefix fuseki:
>>>>> <http://jena.apache.org/fuseki#>
>>>>> <http://jena.apache.org/fuseki#>
>>>>> <http://jena.apache.org/fuseki#>
>>>>> <http://jena.apache.org/fuseki#> . @prefix rdf:
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:
>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>> <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>> <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text:
>>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>>> . @prefix spatial: <http://jena.apache.org/spatial#>
>>>>> <http://jena.apache.org/spatial#>
>>>>> <http://jena.apache.org/spatial#>
>>>>> <http://jena.apache.org/spatial#> . [] a fuseki:Server ;
>>>>> fuseki:services ( <#memory> ) . <#memory> a fuseki:Service ;
>>>>> fuseki:name "memory" ; fuseki:serviceQuery "sparql" ;
>>>>> fuseki:serviceQuery "query" ; fuseki:serviceUpdate "update" ; #
>>>>> SPARQL query service – /memory/update fuseki:serviceUpload
>>>>> "upload" ; # Non-SPARQL upload service
>>>>> fuseki:serviceReadWriteGraphStore "data" ;
>>>>> fuseki:serviceReadGraphStore "get" ; # Graph store protocol
>>>>> (read only) – /memory/get fuseki:dataset :text_dataset ; .
>>>>> <#dataset> rdf:type ja:RDFDataset ; ja:defaultGraph [ a
>>>>> ja:MemoryModel ; ] . Text [] ja:loadClass
>>>>> "org.apache.jena.query.text.TextQuery" . text:TextDataset
>>>>> rdfs:subClassOf ja:RDFDataset . text:TextIndexLucene
>>>>> rdfs:subClassOf text:TextIndex . :text_dataset a
>>>>> text:TextDataset ; text:dataset <#dataset> ; text:index
>>>>> <#textIndexLucene> ; . Text index description
>>>>> <#textIndexLucene> a text:TextIndexLucene ; text:directory
>>>>> <file:Lucene> <file://Lucene> <file://Lucene> <file://lucene/>
>>>>> ; ##text:directory "mem" ; text:entityMap <#entMap> ; .
>>>>> <#entMap> a text:EntityMap ; text:entityField "uri" ;
>>>>> text:defaultField "text" ; text:map ( [ text:field "text" ;
>>>>> text:predicate rdfs:label ] ) .
>>>>
>>>> The server is started with
>>>>> "./fuseki-server --config=config-memory-text.ttl"
>>>>
>>>> and console says it starts properly:
>>>>> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT
>>>>> 2015-05-05T12:48:09+0000 [2015-06-03 12:13:09] Config INFO
>>>>> FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
>>>>>
>>>>>
> [2015-06-03 12:13:09] Config INFO 
> FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
>>>>> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro
>>>>> environment [2015-06-03 12:13:09] Config INFO Shiro file:
>>>>> file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini 
>>>>>
>>>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini> 
>>>>>
>>>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini> 
>>>>>
>>>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini> 
>>>>>
>>>>>
>>>>>
> [2015-06-03 12:13:09] Config INFO Configuration file: 
> config-memory-text.ttl
>>>>> [2015-06-03 12:13:10] Builder INFO Service: :memory [2015-06-03
>>>>> 12:13:11] Config INFO Register: /memory [2015-06-03 12:13:11]
>>>>> Server INFO Started 2015/06/03 12:13:11 CEST on port 3030
>>>>
>>>> I tested it in two versions: the official release 2.0.0 and the
>>>> latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The
>>>> phenomenons are as follows:
>>>>
>>>> In 2.0.0: If I load some triples not containing "rdfs:label",
>>>> everything works properly. However in this case the index engine
>>>> is not working; then as long as I add one triple for "rdfs:label"
>>>> into the file I am loading to Fuseki, error emerges:
>>>>> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl,
>>>>> Content-Type=application/octet-stream, Charset=null => Turtle :
>>>>> Count=40 Triples=40 Quads=0 [2015-06-03 12:10:47] HttpAction
>>>>> WARN Exception during abort (operation attempts to continue):
>>>>> Can't abort a write lock-transaction [2015-06-03 12:10:47]
>>>>> Fuseki INFO [7] 500 Server Error (523 ms)
>>>>
>>>> I remember that a few months ago when 2.0.0 was released for the
>>>> first time, I discovered this issue and reported to you. But at
>>>> that time I didn't realize that the root reason was because of
>>>> indexing. In a later snapshot you fix it, but my test wasn't
>>>> proper so I thought the problem is solved and gave you a wrong
>>>> feedback. My sincere apologizes.
>>>>
>>>> In 2.0.1 SNAPSHOT: The latest snapshot contains the patch I
>>>> mentioned above so they can be successfully loaded. However they
>>>> are not indexed at all. Queries with keyword search do not return
>>>> any result. Following your advice, I tested loading and query
>>>> from both Web UI and s-post/s-query tools, unfortunately (or
>>>> fortunately?) the consequences are the same.
>>>>
>>>> TDB: Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0
>>>> and 2.0.1 SNAPSHOT is also performed, they both works properly.
>>>> Loadings are successful and queries returns search results. The
>>>> only difference is in the configuration file the in-memory
>>>> dataset is replaced with TDB.
>>>>> @prefix : <#> . @prefix fuseki:
>>>>> <http://jena.apache.org/fuseki#>
>>>>> <http://jena.apache.org/fuseki#>
>>>>> <http://jena.apache.org/fuseki#>
>>>>> <http://jena.apache.org/fuseki#> . @prefix rdf:
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:
>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>> <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>> <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text:
>>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>>> . [] rdf:type fuseki:Server ; fuseki:services (
>>>>> <#service_text_tdb> ) . TDB [] ja:loadClass
>>>>> "com.hp.hpl.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf
>>>>> ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . Text []
>>>>> ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>>>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>>>> <#service_text_tdb> a fuseki:Service ; rdfs:label "TDB/text
>>>>> service" ; fuseki:name "tdb" ; fuseki:serviceQuery "query" ;
>>>>> fuseki:serviceQuery "sparql" ; fuseki:serviceUpdate "update" ;
>>>>> fuseki:serviceUpload "upload" ; fuseki:serviceReadGraphStore
>>>>> "get" ; fuseki:serviceReadWriteGraphStore "data" ;
>>>>> fuseki:dataset <#text_dataset> ; . <#text_dataset> a
>>>>> text:TextDataset ; text:dataset <#dataset> ; text:index
>>>>> <#indexLucene> ; . <#dataset> a tdb:DatasetTDB ; tdb:location
>>>>> "DB" ; ##tdb:unionDefaultGraph true ; . <#indexLucene> a
>>>>> text:TextIndexLucene ; text:directory <file:Lucene>
>>>>> <file://Lucene> <file://Lucene> <file://lucene/> ;
>>>>> ##text:directory "mem" ; text:entityMap <#entMap> ; . <#entMap>
>>>>> a text:EntityMap ; text:entityField "uri" ; text:defaultField
>>>>> "text" ; text:map ( [ text:field "text" ; text:predicate
>>>>> rdfs:label ] ) .
>>>>
>>>> Any advice for it now? Thank you very much for your efforts in
>>>> advance.
>>>>
>>>> Regards, Yang
>>>>
>>>> PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to
>>>> test on it as well. However I wasn't able to run it.
>>>>
>>>> On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:
>>>>> Hi Andy,
>>>>>
>>>>> Thank you very much for your reply.
>>>>>
>>>>> In fact the problem is irrelevant to the preloaded triples. It
>>>>> won't work no matter if we start an empty or preloaded one.
>>>>> Moreover, it takes around 1 minute to load 38k triples, while
>>>>> TDB only needs 6 seconds. If we turn off text search for an
>>>>> in-memory dataset, the loading speed rushed to only 1 second.
>>>>> That's why I thought problem is from Fuseki side.
>>>>>
>>>>> As for TDB with reasoning, I don't agree with your opinion that
>>>>> the dataset is not attached to a text index. We have defined
>>>>> the dataset:
>>>>>> <#tdb_inf_ds> a ja:RDFDataset ; ja:defaultGraph
>>>>>> <#tdb_inf> ; .
>>>>> We tell Lucene to index it:
>>>>>> :text_dataset a text:TextDataset ; text:dataset
>>>>>> <#tdb_inf_ds> ; text:index <#textIndexLucene> ; .
>>>>> And we assert that the dataset includes an RDFS inference
>>>>> model:
>>>>>> <#tdb_inf> a ja:InfModel ; rdfs:label "RDFS Inference Model"
>>>>>> ; ja:baseModel <#tdb_graph> ; ja:reasoner [ ja:reasonerURL
>>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
>>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
>>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
>>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] .
>>>>>
>>>>> Then both text search and RDFS reasoning should work. Such
>>>>> configuration works properly in Fuseki 1.1.1. However things
>>>>> changed in 1.1.2 and 2.0.x. I don't know what I should do to
>>>>> adjust to the new system.
>>>>>
>>>>> Thank you very much for your efforts again and have a nice
>>>>> day.
>>>>>
>>>>> Regards, Yang
>>>>>
>>>>>
>>>>> On 04/17/2015 02:53 PM, Andy Seaborne wrote:
>>>>>> On 14/04/15 18:51, Yang Yuanzhe wrote:
>>>>>>> Hi there,
>>>>>>>
>>>>>>> Sorry to trouble you again. Last month I wrote to you to
>>>>>>> figure out the bug in text search for TDB. Given the
>>>>>>> following configuration, text search works with TDB:
>>>>>>>
>>>>>> ...
>>>>>>
>>>>>> Comments inline:
>>>>>>
>>>>>>> Now we want to use text search for in-memory datasets, but
>>>>>>> we failed after some trials, the configuration file we use
>>>>>>> is as follows:
>>>>>>>
>>>>>>>> @prefix :        <#> . @prefix fuseki:
>>>>>>>> <http://jena.apache.org/fuseki#>
>>>>>>>> <http://jena.apache.org/fuseki#>
>>>>>>>> <http://jena.apache.org/fuseki#>
>>>>>>>> <http://jena.apache.org/fuseki#> . @prefix rdf:
>>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix
>>>>>>>> rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>>> <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
>>>>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>>> <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
>>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix
>>>>>>>> text:    <http://jena.apache.org/text#>
>>>>>>>> <http://jena.apache.org/text#>
>>>>>>>> <http://jena.apache.org/text#>
>>>>>>>> <http://jena.apache.org/text#> . @prefix spatial:
>>>>>>>> <http://jena.apache.org/spatial#>
>>>>>>>> <http://jena.apache.org/spatial#>
>>>>>>>> <http://jena.apache.org/spatial#>
>>>>>>>> <http://jena.apache.org/spatial#> .
>>>>>>>>
>>>>>>>> [] a fuseki:Server ; fuseki:services ( <#memory> ) .
>>>>>>>>
>>>>>>>> <#memory> a fuseki:Service ; fuseki:name
>>>>>>>> "memory" ; fuseki:serviceQuery             "sparql" ;
>>>>>>>> fuseki:serviceQuery             "query" ;
>>>>>>>> fuseki:serviceUpdate            "update" ;   # SPARQL
>>>>>>>> query service -- /memory/update fuseki:serviceUpload
>>>>>>>> "upload" ;   # Non-SPARQL upload service
>>>>>>>> fuseki:serviceReadWriteGraphStore      "data" ;
>>>>>>>> fuseki:serviceReadGraphStore       "get" ;   # Graph
>>>>>>>> store protocol (read only) -- /memory/get fuseki:dataset
>>>>>>>> :text_dataset ; .
>>>>>>>>
>>>>>>>> <#dataset> rdf:type ja:RDFDataset ; ja:defaultGraph [ a
>>>>>>>> ja:MemoryModel ; ja:content [ja:externalContent
>>>>>>>> <file:dcat-vl.ttl> <file://dcat-vl.ttl>
>>>>>>>> <file://dcat-vl.ttl> <file://dcat-vl.ttl/> ] ; ] .
>>>>>>
>>>>>> That is going to load the data each time the server starts
>>>>>> but does not attach it anyway to the text index.
>>>>>>
>>>>>> Is it the same data as is loaded (separately) into the text
>>>>>> index?
>>>>>>
>>>>>> Similarly for the inference setup (which is in a different
>>>>>> Lucene index file:Text <file://Text> <file://Text>
>>>>>> <file://text/>) ...
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>>>>
>>>>>>>> # Text [] ja:loadClass
>>>>>>>> "org.apache.jena.query.text.TextQuery" . text:TextDataset
>>>>>>>> rdfs:subClassOf   ja:RDFDataset . text:TextIndexLucene
>>>>>>>> rdfs:subClassOf   text:TextIndex .
>>>>>>>>
>>>>>>>> :text_dataset a text:TextDataset ; text:dataset
>>>>>>>> <#dataset> ; text:index <#textIndexLucene> ; .
>>>>>>>>
>>>>>>>> # Text index description <#textIndexLucene> a
>>>>>>>> text:TextIndexLucene ; text:directory <file:Lucene>
>>>>>>>> <file://Lucene> <file://Lucene> <file://lucene/> ;
>>>>>>>> ##text:directory "mem" ; text:entityMap <#entMap> ; .
>>>>>>>>
>>>>>>>> <#entMap> a text:EntityMap ; text:entityField      "uri"
>>>>>>>> ; text:defaultField     "text" ; text:map ( [ text:field
>>>>>>>> "text" ; text:predicate rdfs:label ] ) .
>>>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>
>>>>>>> All the tests are based on the 2.0.1 SNAPSHOT built on
>>>>>>> April 8th. Any clue or any suggestion for this issue? Thank
>>>>>>> you very much and have a nice day.
>>>>>>>
>>>>>>> Regards, Yang
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Andy Seaborne <an...@apache.org>.

On 08/06/15 14:34, Yang wrote:
> Hi Andy,
>
> Thank you very much for the suggestion. In-memory TDB dataset works
> properly.
>
> As for the 500 error in loading, maybe you didn't notice my
> explanation about it. It emerges on 2.0.0 only when an in-memory
> dataset is used with text search enabled. I reported this error to
> you in March and it is fixed later on in a snapshot. Now in the
> latest snapshot loading is working, but Lucene does not index any
> more.

Something different is happening because the text indexing code was made 
more integrated with transactions and a general purpose dataset is not 
properly transactional - it's can combine graphs with different storages.

> Anyway, while using in-memory TDB for the moment, we are looking
> forward to your solution (or even a new release) for it. Thank you in
> advance for your efforts and have a nice day.

JENA-956 has already been fixed.
https://issues.apache.org/jira/browse/JENA-956

>
> Regards, Yang
>
> PS: I am working behind some firewalls so sometimes I can't send out
> emails. :D

So far, I've had to manually let through your emails.  Please could you
register properly.

You are registered as yang@proxml.be but sending from  yang@mail.proxml.be.

To subscribe a specific address use "users-subscribe-ID=HOST@..."

users-subscribe-yang=mail.proxml.be@jena.apache.org

	Andy


>
>
> On 06/05/2015 12:32 PM, Andy Seaborne wrote:
>> I've logged this as JENA-956 (with details).  The work-round is to
>> use an in-memory TDB dataset.
>>
>> tdb:location "--mem--" ;
>>
>>
>>
>>> [2015-06-03 12:10:47] HttpAction WARN Exception during abort
>>> (operation attempts to continue): Can't abort a write
>>> lock-transaction [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server
>>> Error (523 ms)
>>
>> You loaded the data twice I guess.
>>
>> Andy
>>
>>
>> PS Your email address yang@mail.proxml.be
>> <ma...@mail.proxml.be> does not always work.
>>
>> Reporting-MTA: dns; mailrelay118.isp.belgacom.be
>>
>> Final-Recipient: rfc822;yang@mail.proxml.be
>> <ma...@mail.proxml.be> Action: failed Status: 5.0.0
>> (permanent failure) Remote-MTA: dns; [91.183.52.144]
>> Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0
>> Error: too many hops' (delivery attempts: 0)
>>
>>
>>
>> On 05/06/15 09:17, Yang wrote:
>>> Hi Andy,
>>>
>>> I am sorry for such a late response. We were busy on another
>>> project during this period. Now I try to explain how I reproduce
>>> the error step by step. I did send you an email to the mailing
>>> list yesterday, however it never shows up. So I would like to
>>> give another trial today. My apologies for possible duplicates.
>>>
>>> So the problem is there is something wrong in the search indexing
>>> for in-memory datasets. Here is the configuration file I used, it
>>> should be basic enough: a server description, a service
>>> description and an index engine associating to the dataset to
>>> index "rdfs:label".
>>>
>>>> @prefix : <#> . @prefix fuseki:
>>>> <http://jena.apache.org/fuseki#>
>>>> <http://jena.apache.org/fuseki#>
>>>> <http://jena.apache.org/fuseki#>
>>>> <http://jena.apache.org/fuseki#> . @prefix rdf:
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:
>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>> <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>> <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text:
>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>> . @prefix spatial: <http://jena.apache.org/spatial#>
>>>> <http://jena.apache.org/spatial#>
>>>> <http://jena.apache.org/spatial#>
>>>> <http://jena.apache.org/spatial#> . [] a fuseki:Server ;
>>>> fuseki:services ( <#memory> ) . <#memory> a fuseki:Service ;
>>>> fuseki:name "memory" ; fuseki:serviceQuery "sparql" ;
>>>> fuseki:serviceQuery "query" ; fuseki:serviceUpdate "update" ; #
>>>> SPARQL query service – /memory/update fuseki:serviceUpload
>>>> "upload" ; # Non-SPARQL upload service
>>>> fuseki:serviceReadWriteGraphStore "data" ;
>>>> fuseki:serviceReadGraphStore "get" ; # Graph store protocol
>>>> (read only) – /memory/get fuseki:dataset :text_dataset ; .
>>>> <#dataset> rdf:type ja:RDFDataset ; ja:defaultGraph [ a
>>>> ja:MemoryModel ; ] . Text [] ja:loadClass
>>>> "org.apache.jena.query.text.TextQuery" . text:TextDataset
>>>> rdfs:subClassOf ja:RDFDataset . text:TextIndexLucene
>>>> rdfs:subClassOf text:TextIndex . :text_dataset a
>>>> text:TextDataset ; text:dataset <#dataset> ; text:index
>>>> <#textIndexLucene> ; . Text index description
>>>> <#textIndexLucene> a text:TextIndexLucene ; text:directory
>>>> <file:Lucene> <file://Lucene> <file://Lucene> <file://lucene/>
>>>> ; ##text:directory "mem" ; text:entityMap <#entMap> ; .
>>>> <#entMap> a text:EntityMap ; text:entityField "uri" ;
>>>> text:defaultField "text" ; text:map ( [ text:field "text" ;
>>>> text:predicate rdfs:label ] ) .
>>>
>>> The server is started with
>>>> "./fuseki-server --config=config-memory-text.ttl"
>>>
>>> and console says it starts properly:
>>>> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT
>>>> 2015-05-05T12:48:09+0000 [2015-06-03 12:13:09] Config INFO
>>>> FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
>>>>
>>>>
[2015-06-03 12:13:09] Config INFO 
FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
>>>> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro
>>>> environment [2015-06-03 12:13:09] Config INFO Shiro file:
>>>> file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini
>>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>>>> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>>>>
>>>>
[2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
>>>> [2015-06-03 12:13:10] Builder INFO Service: :memory [2015-06-03
>>>> 12:13:11] Config INFO Register: /memory [2015-06-03 12:13:11]
>>>> Server INFO Started 2015/06/03 12:13:11 CEST on port 3030
>>>
>>> I tested it in two versions: the official release 2.0.0 and the
>>> latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The
>>> phenomenons are as follows:
>>>
>>> In 2.0.0: If I load some triples not containing "rdfs:label",
>>> everything works properly. However in this case the index engine
>>> is not working; then as long as I add one triple for "rdfs:label"
>>> into the file I am loading to Fuseki, error emerges:
>>>> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl,
>>>> Content-Type=application/octet-stream, Charset=null => Turtle :
>>>> Count=40 Triples=40 Quads=0 [2015-06-03 12:10:47] HttpAction
>>>> WARN Exception during abort (operation attempts to continue):
>>>> Can't abort a write lock-transaction [2015-06-03 12:10:47]
>>>> Fuseki INFO [7] 500 Server Error (523 ms)
>>>
>>> I remember that a few months ago when 2.0.0 was released for the
>>> first time, I discovered this issue and reported to you. But at
>>> that time I didn't realize that the root reason was because of
>>> indexing. In a later snapshot you fix it, but my test wasn't
>>> proper so I thought the problem is solved and gave you a wrong
>>> feedback. My sincere apologizes.
>>>
>>> In 2.0.1 SNAPSHOT: The latest snapshot contains the patch I
>>> mentioned above so they can be successfully loaded. However they
>>> are not indexed at all. Queries with keyword search do not return
>>> any result. Following your advice, I tested loading and query
>>> from both Web UI and s-post/s-query tools, unfortunately (or
>>> fortunately?) the consequences are the same.
>>>
>>> TDB: Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0
>>> and 2.0.1 SNAPSHOT is also performed, they both works properly.
>>> Loadings are successful and queries returns search results. The
>>> only difference is in the configuration file the in-memory
>>> dataset is replaced with TDB.
>>>> @prefix : <#> . @prefix fuseki:
>>>> <http://jena.apache.org/fuseki#>
>>>> <http://jena.apache.org/fuseki#>
>>>> <http://jena.apache.org/fuseki#>
>>>> <http://jena.apache.org/fuseki#> . @prefix rdf:
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:
>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>> <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>> <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text:
>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>> <http://jena.apache.org/text#> <http://jena.apache.org/text#>
>>>> . [] rdf:type fuseki:Server ; fuseki:services (
>>>> <#service_text_tdb> ) . TDB [] ja:loadClass
>>>> "com.hp.hpl.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf
>>>> ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . Text []
>>>> ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>>>> <#service_text_tdb> a fuseki:Service ; rdfs:label "TDB/text
>>>> service" ; fuseki:name "tdb" ; fuseki:serviceQuery "query" ;
>>>> fuseki:serviceQuery "sparql" ; fuseki:serviceUpdate "update" ;
>>>> fuseki:serviceUpload "upload" ; fuseki:serviceReadGraphStore
>>>> "get" ; fuseki:serviceReadWriteGraphStore "data" ;
>>>> fuseki:dataset <#text_dataset> ; . <#text_dataset> a
>>>> text:TextDataset ; text:dataset <#dataset> ; text:index
>>>> <#indexLucene> ; . <#dataset> a tdb:DatasetTDB ; tdb:location
>>>> "DB" ; ##tdb:unionDefaultGraph true ; . <#indexLucene> a
>>>> text:TextIndexLucene ; text:directory <file:Lucene>
>>>> <file://Lucene> <file://Lucene> <file://lucene/> ;
>>>> ##text:directory "mem" ; text:entityMap <#entMap> ; . <#entMap>
>>>> a text:EntityMap ; text:entityField "uri" ; text:defaultField
>>>> "text" ; text:map ( [ text:field "text" ; text:predicate
>>>> rdfs:label ] ) .
>>>
>>> Any advice for it now? Thank you very much for your efforts in
>>> advance.
>>>
>>> Regards, Yang
>>>
>>> PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to
>>> test on it as well. However I wasn't able to run it.
>>>
>>> On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:
>>>> Hi Andy,
>>>>
>>>> Thank you very much for your reply.
>>>>
>>>> In fact the problem is irrelevant to the preloaded triples. It
>>>> won't work no matter if we start an empty or preloaded one.
>>>> Moreover, it takes around 1 minute to load 38k triples, while
>>>> TDB only needs 6 seconds. If we turn off text search for an
>>>> in-memory dataset, the loading speed rushed to only 1 second.
>>>> That's why I thought problem is from Fuseki side.
>>>>
>>>> As for TDB with reasoning, I don't agree with your opinion that
>>>> the dataset is not attached to a text index. We have defined
>>>> the dataset:
>>>>> <#tdb_inf_ds> a ja:RDFDataset ; ja:defaultGraph
>>>>> <#tdb_inf> ; .
>>>> We tell Lucene to index it:
>>>>> :text_dataset a text:TextDataset ; text:dataset
>>>>> <#tdb_inf_ds> ; text:index     <#textIndexLucene> ; .
>>>> And we assert that the dataset includes an RDFS inference
>>>> model:
>>>>> <#tdb_inf> a ja:InfModel ; rdfs:label "RDFS Inference Model"
>>>>> ; ja:baseModel <#tdb_graph> ; ja:reasoner [ ja:reasonerURL
>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
>>>>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] .
>>>>
>>>> Then both text search and RDFS reasoning should work. Such
>>>> configuration works properly in Fuseki 1.1.1. However things
>>>> changed in 1.1.2 and 2.0.x. I don't know what I should do to
>>>> adjust to the new system.
>>>>
>>>> Thank you very much for your efforts again and have a nice
>>>> day.
>>>>
>>>> Regards, Yang
>>>>
>>>>
>>>> On 04/17/2015 02:53 PM, Andy Seaborne wrote:
>>>>> On 14/04/15 18:51, Yang Yuanzhe wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> Sorry to trouble you again. Last month I wrote to you to
>>>>>> figure out the bug in text search for TDB. Given the
>>>>>> following configuration, text search works with TDB:
>>>>>>
>>>>> ...
>>>>>
>>>>> Comments inline:
>>>>>
>>>>>> Now we want to use text search for in-memory datasets, but
>>>>>> we failed after some trials, the configuration file we use
>>>>>> is as follows:
>>>>>>
>>>>>>> @prefix :        <#> . @prefix fuseki:
>>>>>>> <http://jena.apache.org/fuseki#>
>>>>>>> <http://jena.apache.org/fuseki#>
>>>>>>> <http://jena.apache.org/fuseki#>
>>>>>>> <http://jena.apache.org/fuseki#> . @prefix rdf:
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix
>>>>>>> rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>> <http://www.w3.org/2000/01/rdf-schema#>
>>>>>>> <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
>>>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>> <http://jena.hpl.hp.com/2008/tdb#>
>>>>>>> <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>
>>>>>>> <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix
>>>>>>> text:    <http://jena.apache.org/text#>
>>>>>>> <http://jena.apache.org/text#>
>>>>>>> <http://jena.apache.org/text#>
>>>>>>> <http://jena.apache.org/text#> . @prefix spatial:
>>>>>>> <http://jena.apache.org/spatial#>
>>>>>>> <http://jena.apache.org/spatial#>
>>>>>>> <http://jena.apache.org/spatial#>
>>>>>>> <http://jena.apache.org/spatial#> .
>>>>>>>
>>>>>>> [] a fuseki:Server ; fuseki:services ( <#memory> ) .
>>>>>>>
>>>>>>> <#memory> a fuseki:Service ; fuseki:name
>>>>>>> "memory" ; fuseki:serviceQuery             "sparql" ;
>>>>>>> fuseki:serviceQuery             "query" ;
>>>>>>> fuseki:serviceUpdate            "update" ;   # SPARQL
>>>>>>> query service -- /memory/update fuseki:serviceUpload
>>>>>>> "upload" ;   # Non-SPARQL upload service
>>>>>>> fuseki:serviceReadWriteGraphStore      "data" ;
>>>>>>> fuseki:serviceReadGraphStore       "get" ;   # Graph
>>>>>>> store protocol (read only) -- /memory/get fuseki:dataset
>>>>>>> :text_dataset ; .
>>>>>>>
>>>>>>> <#dataset> rdf:type ja:RDFDataset ; ja:defaultGraph [ a
>>>>>>> ja:MemoryModel ; ja:content [ja:externalContent
>>>>>>> <file:dcat-vl.ttl> <file://dcat-vl.ttl>
>>>>>>> <file://dcat-vl.ttl> <file://dcat-vl.ttl/> ] ; ] .
>>>>>
>>>>> That is going to load the data each time the server starts
>>>>> but does not attach it anyway to the text index.
>>>>>
>>>>> Is it the same data as is loaded (separately) into the text
>>>>> index?
>>>>>
>>>>> Similarly for the inference setup (which is in a different
>>>>> Lucene index file:Text <file://Text> <file://Text>
>>>>> <file://text/>) ...
>>>>>
>>>>> Andy
>>>>>
>>>>>>>
>>>>>>> # Text [] ja:loadClass
>>>>>>> "org.apache.jena.query.text.TextQuery" . text:TextDataset
>>>>>>> rdfs:subClassOf   ja:RDFDataset . text:TextIndexLucene
>>>>>>> rdfs:subClassOf   text:TextIndex .
>>>>>>>
>>>>>>> :text_dataset a text:TextDataset ; text:dataset
>>>>>>> <#dataset> ; text:index     <#textIndexLucene> ; .
>>>>>>>
>>>>>>> # Text index description <#textIndexLucene> a
>>>>>>> text:TextIndexLucene ; text:directory <file:Lucene>
>>>>>>> <file://Lucene> <file://Lucene> <file://lucene/> ;
>>>>>>> ##text:directory "mem" ; text:entityMap <#entMap> ; .
>>>>>>>
>>>>>>> <#entMap> a text:EntityMap ; text:entityField      "uri"
>>>>>>> ; text:defaultField     "text" ; text:map ( [ text:field
>>>>>>> "text" ; text:predicate rdfs:label ] ) .
>>>>>>>
>>>>> ...
>>>>>
>>>>>>
>>>>>> All the tests are based on the 2.0.1 SNAPSHOT built on
>>>>>> April 8th. Any clue or any suggestion for this issue? Thank
>>>>>> you very much and have a nice day.
>>>>>>
>>>>>> Regards, Yang
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Yang <ya...@mail.proxml.be>.

Hi Andy,

Thank you very much for the suggestion. In-memory TDB dataset works properly. 

As for the 500 error in loading, maybe you didn't notice my explanation about it. It emerges on 2.0.0 only when an in-memory dataset is used with text search enabled. I reported this error to you in March and it is fixed later on in a snapshot. Now in the latest snapshot loading is working, but Lucene does not index any more. 

Anyway, while using in-memory TDB for the moment, we are looking forward to your solution (or even a new release) for it. Thank you in advance for your efforts and have a nice day.

Regards,
Yang

PS: I am working behind some firewalls so sometimes I can't send out emails. :D


On 06/05/2015 12:32 PM, Andy Seaborne wrote:
> I've logged this as JENA-956 (with details).  The work-round is to use an in-memory TDB dataset. 
> 
>      tdb:location "--mem--" ; 
> 
> 
> 
> > [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation attempts to continue): Can't abort a write lock-transaction 
> > [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms) 
> 
> You loaded the data twice I guess. 
> 
>     Andy 
> 
> 
> PS Your email address yang@mail.proxml.be <ma...@mail.proxml.be> does not always work. 
> 
> Reporting-MTA: dns; mailrelay118.isp.belgacom.be 
> 
> Final-Recipient: rfc822;yang@mail.proxml.be <ma...@mail.proxml.be> 
> Action: failed 
> Status: 5.0.0 (permanent failure) 
> Remote-MTA: dns; [91.183.52.144] 
> Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0 Error: too many hops' (delivery attempts: 0) 
> 
> 
> 
> On 05/06/15 09:17, Yang wrote: 
>> Hi Andy, 
>> 
>> I am sorry for such a late response. We were busy on another project during this period. Now I try to explain how I reproduce the error step by step. I did send you an email to the mailing list yesterday, however it never shows up. So I would like to give another trial today. My apologies for possible duplicates. 
>> 
>> So the problem is there is something wrong in the search indexing for in-memory datasets. 
>> Here is the configuration file I used, it should be basic enough: a server description, a service description and an index engine associating to the dataset to index "rdfs:label". 
>> 
>>> @prefix : <#> . 
>>> @prefix fuseki: <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> . 
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> . 
>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> . 
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> <http://jena.apache.org/text#> <http://jena.apache.org/text#> . 
>>> @prefix spatial: <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> . 
>>> [] a fuseki:Server ; 
>>> fuseki:services ( 
>>> <#memory> 
>>> ) . 
>>> <#memory> a fuseki:Service ; 
>>> fuseki:name "memory" ; 
>>> fuseki:serviceQuery "sparql" ; 
>>> fuseki:serviceQuery "query" ; 
>>> fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update 
>>> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service 
>>> fuseki:serviceReadWriteGraphStore "data" ; 
>>> fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – /memory/get 
>>> fuseki:dataset :text_dataset ; 
>>> . 
>>> <#dataset> rdf:type ja:RDFDataset ; 
>>> ja:defaultGraph 
>>> [ 
>>> a ja:MemoryModel ; 
>>> ] . 
>>> Text 
>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>> text:TextDataset rdfs:subClassOf ja:RDFDataset . 
>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex . 
>>> :text_dataset a text:TextDataset ; 
>>> text:dataset <#dataset> ; 
>>> text:index <#textIndexLucene> ; 
>>> . 
>>> Text index description 
>>> <#textIndexLucene> a text:TextIndexLucene ; 
>>> text:directory <file:Lucene> <file://Lucene> <file://Lucene> <file://lucene/> ; 
>>> ##text:directory "mem" ; 
>>> text:entityMap <#entMap> ; 
>>> . 
>>> <#entMap> a text:EntityMap ; 
>>> text:entityField "uri" ; 
>>> text:defaultField "text" ; 
>>> text:map ( 
>>> [ text:field "text" ; text:predicate rdfs:label ] 
>>> ) . 
>> 
>> The server is started with 
>>> "./fuseki-server --config=config-memory-text.ttl" 
>> 
>> and console says it starts properly: 
>>> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000 
>>> [2015-06-03 12:13:09] Config INFO FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT 
>>> [2015-06-03 12:13:09] Config INFO FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run 
>>> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment 
>>> [2015-06-03 12:13:09] Config INFO Shiro file: file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini> <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini> 
>>> [2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl 
>>> [2015-06-03 12:13:10] Builder INFO Service: :memory 
>>> [2015-06-03 12:13:11] Config INFO Register: /memory 
>>> [2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 3030 
>> 
>> I tested it in two versions: the official release 2.0.0 and the latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows: 
>> 
>> In 2.0.0: 
>> If I load some triples not containing "rdfs:label", everything works properly. However in this case the index engine is not working; then as long as I add one triple for "rdfs:label" into the file I am loading to Fuseki, error emerges: 
>>> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 Triples=40 Quads=0 
>>> [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation attempts to continue): Can't abort a write lock-transaction 
>>> [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms) 
>> 
>> I remember that a few months ago when 2.0.0 was released for the first time, I discovered this issue and reported to you. But at that time I didn't realize that the root reason was because of indexing. In a later snapshot you fix it, but my test wasn't proper so I thought the problem is solved and gave you a wrong feedback. My sincere apologizes. 
>> 
>> In 2.0.1 SNAPSHOT: 
>> The latest snapshot contains the patch I mentioned above so they can be successfully loaded. However they are not indexed at all. Queries with keyword search do not return any result. 
>> Following your advice, I tested loading and query from both Web UI and s-post/s-query tools, unfortunately (or fortunately?) the consequences are the same. 
>> 
>> TDB: 
>> Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT is also performed, they both works properly. Loadings are successful and queries returns search results. The only difference is in the configuration file the in-memory dataset is replaced with TDB. 
>>> @prefix : <#> . 
>>> @prefix fuseki: <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> . 
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> . 
>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> . 
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> <http://jena.apache.org/text#> <http://jena.apache.org/text#> . 
>>> [] rdf:type fuseki:Server ; 
>>> fuseki:services ( 
>>> <#service_text_tdb> 
>>> ) . 
>>> TDB 
>>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" . 
>>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . 
>>> tdb:GraphTDB rdfs:subClassOf ja:Model . 
>>> Text 
>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>> text:TextDataset rdfs:subClassOf ja:RDFDataset . 
>>> text:TextIndexLucene rdfs:subClassOf text:TextIndex . 
>>> <#service_text_tdb> a fuseki:Service ; 
>>> rdfs:label "TDB/text service" ; 
>>> fuseki:name "tdb" ; 
>>> fuseki:serviceQuery "query" ; 
>>> fuseki:serviceQuery "sparql" ; 
>>> fuseki:serviceUpdate "update" ; 
>>> fuseki:serviceUpload "upload" ; 
>>> fuseki:serviceReadGraphStore "get" ; 
>>> fuseki:serviceReadWriteGraphStore "data" ; 
>>> fuseki:dataset <#text_dataset> ; 
>>> . 
>>> <#text_dataset> a text:TextDataset ; 
>>> text:dataset <#dataset> ; 
>>> text:index <#indexLucene> ; 
>>> . 
>>> <#dataset> a tdb:DatasetTDB ; 
>>> tdb:location "DB" ; 
>>> ##tdb:unionDefaultGraph true ; 
>>> . 
>>> <#indexLucene> a text:TextIndexLucene ; 
>>> text:directory <file:Lucene> <file://Lucene> <file://Lucene> <file://lucene/> ; 
>>> ##text:directory "mem" ; 
>>> text:entityMap <#entMap> ; 
>>> . 
>>> <#entMap> a text:EntityMap ; 
>>> text:entityField "uri" ; 
>>> text:defaultField "text" ; 
>>> text:map ( 
>>> [ text:field "text" ; text:predicate rdfs:label ] 
>>> ) . 
>> 
>> Any advice for it now? Thank you very much for your efforts in advance. 
>> 
>> Regards, 
>> Yang 
>> 
>> PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as well. However I wasn't able to run it. 
>> 
>> On 04/17/2015 05:29 PM, Yang Yuanzhe wrote: 
>>> Hi Andy, 
>>> 
>>> Thank you very much for your reply. 
>>> 
>>> In fact the problem is irrelevant to the preloaded triples. It won't work no matter if we start an empty or preloaded one. Moreover, it takes around 1 minute to load 38k triples, while TDB only needs 6 seconds. If we turn off text search for an in-memory dataset, the loading speed rushed to only 1 second. That's why I thought problem is from Fuseki side. 
>>> 
>>> As for TDB with reasoning, I don't agree with your opinion that the dataset is not attached to a text index. We have defined the dataset: 
>>>> <#tdb_inf_ds> a ja:RDFDataset ; 
>>>>      ja:defaultGraph       <#tdb_inf> ; 
>>>>      . 
>>> We tell Lucene to index it: 
>>>> :text_dataset a text:TextDataset ; 
>>>>      text:dataset   <#tdb_inf_ds> ; 
>>>>      text:index     <#textIndexLucene> ; 
>>>>      . 
>>> And we assert that the dataset includes an RDFS inference model: 
>>>> <#tdb_inf> a ja:InfModel ; 
>>>>      rdfs:label "RDFS Inference Model" ; 
>>>>      ja:baseModel <#tdb_graph> ; 
>>>>      ja:reasoner 
>>>>           [ ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] 
>>>>      . 
>>> 
>>> Then both text search and RDFS reasoning should work. Such configuration works properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. I don't know what I should do to adjust to the new system. 
>>> 
>>> Thank you very much for your efforts again and have a nice day. 
>>> 
>>> Regards, 
>>> Yang 
>>> 
>>> 
>>> On 04/17/2015 02:53 PM, Andy Seaborne wrote: 
>>>> On 14/04/15 18:51, Yang Yuanzhe wrote: 
>>>>> Hi there, 
>>>>> 
>>>>> Sorry to trouble you again. Last month I wrote to you to figure out the 
>>>>> bug in text search for TDB. Given the following configuration, text 
>>>>> search works with TDB: 
>>>>> 
>>>> ... 
>>>> 
>>>> Comments inline: 
>>>> 
>>>>> Now we want to use text search for in-memory datasets, but we failed 
>>>>> after some trials, the configuration file we use is as follows: 
>>>>> 
>>>>>> @prefix :        <#> . 
>>>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> . 
>>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>>>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> . 
>>>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> . 
>>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>>>>> @prefix text:    <http://jena.apache.org/text#> <http://jena.apache.org/text#> <http://jena.apache.org/text#> <http://jena.apache.org/text#> . 
>>>>>> @prefix spatial:    <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> . 
>>>>>> 
>>>>>> [] a fuseki:Server ; 
>>>>>>     fuseki:services ( 
>>>>>>       <#memory> 
>>>>>>     ) . 
>>>>>> 
>>>>>> <#memory> a fuseki:Service ; 
>>>>>>      fuseki:name                     "memory" ; 
>>>>>>      fuseki:serviceQuery             "sparql" ; 
>>>>>>      fuseki:serviceQuery             "query" ; 
>>>>>>      fuseki:serviceUpdate            "update" ;   # SPARQL query 
>>>>>> service -- /memory/update 
>>>>>>      fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload 
>>>>>> service 
>>>>>>      fuseki:serviceReadWriteGraphStore      "data" ; 
>>>>>>      fuseki:serviceReadGraphStore       "get" ;   # Graph store 
>>>>>> protocol (read only) -- /memory/get 
>>>>>>      fuseki:dataset           :text_dataset ; 
>>>>>>      . 
>>>>>> 
>>>>>> <#dataset> rdf:type ja:RDFDataset ; 
>>>>>>      ja:defaultGraph 
>>>>>>            [ 
>>>>>>              a ja:MemoryModel ; 
>>>>>>              ja:content [ja:externalContent <file:dcat-vl.ttl> <file://dcat-vl.ttl> <file://dcat-vl.ttl> <file://dcat-vl.ttl/> ] ; 
>>>>>>            ] . 
>>>> 
>>>> That is going to load the data each time the server starts but does not attach it anyway to the text index. 
>>>> 
>>>> Is it the same data as is loaded (separately) into the text index? 
>>>> 
>>>> Similarly for the inference setup (which is in a different Lucene index file:Text <file://Text> <file://Text> <file://text/>) ... 
>>>> 
>>>>      Andy 
>>>> 
>>>>>> 
>>>>>> # Text 
>>>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset . 
>>>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex . 
>>>>>> 
>>>>>> :text_dataset a text:TextDataset ; 
>>>>>>      text:dataset   <#dataset> ; 
>>>>>>      text:index     <#textIndexLucene> ; 
>>>>>>      . 
>>>>>> 
>>>>>> # Text index description 
>>>>>> <#textIndexLucene> a text:TextIndexLucene ; 
>>>>>>      text:directory <file:Lucene> <file://Lucene> <file://Lucene> <file://lucene/> ; 
>>>>>>      ##text:directory "mem" ; 
>>>>>>      text:entityMap <#entMap> ; 
>>>>>>      . 
>>>>>> 
>>>>>> <#entMap> a text:EntityMap ; 
>>>>>>      text:entityField      "uri" ; 
>>>>>>      text:defaultField     "text" ; 
>>>>>>      text:map ( 
>>>>>>           [ text:field "text" ; text:predicate rdfs:label ] 
>>>>>>           ) . 
>>>>>> 
>>>> ... 
>>>> 
>>>>> 
>>>>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any 
>>>>> clue or any suggestion for this issue? Thank you very much and have a 
>>>>> nice day. 
>>>>> 
>>>>> Regards, 
>>>>> Yang 
>>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Andy Seaborne <an...@apache.org>.

I've logged this as JENA-956 (with details).  The work-round is to use 
an in-memory TDB dataset.

      tdb:location "--mem--" ;



 > [2015-06-03 12:10:47] HttpAction WARN Exception during abort 
(operation attempts to continue): Can't abort a write lock-transaction
 > [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)

You loaded the data twice I guess.

	Andy


PS Your email address yang@mail.proxml.be does not always work.

Reporting-MTA: dns; mailrelay118.isp.belgacom.be

Final-Recipient: rfc822;yang@mail.proxml.be
Action: failed
Status: 5.0.0 (permanent failure)
Remote-MTA: dns; [91.183.52.144]
Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0 
Error: too many hops' (delivery attempts: 0)



On 05/06/15 09:17, Yang wrote:
> Hi Andy,
>
> I am sorry for such a late response. We were busy on another project during this period. Now I try to explain how I reproduce the error step by step. I did send you an email to the mailing list yesterday, however it never shows up. So I would like to give another trial today. My apologies for possible duplicates.
>
> So the problem is there is something wrong in the search indexing for in-memory datasets.
> Here is the configuration file I used, it should be basic enough: a server description, a service description and an index engine associating to the dataset to index "rdfs:label".
>
>> @prefix : <#> .
>> @prefix fuseki: <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> .
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> .
>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> .
>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> .
>> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
>> @prefix spatial: <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> .
>> [] a fuseki:Server ;
>> fuseki:services (
>> <#memory>
>> ) .
>> <#memory> a fuseki:Service ;
>> fuseki:name "memory" ;
>> fuseki:serviceQuery "sparql" ;
>> fuseki:serviceQuery "query" ;
>> fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update
>> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
>> fuseki:serviceReadWriteGraphStore "data" ;
>> fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – /memory/get
>> fuseki:dataset :text_dataset ;
>> .
>> <#dataset> rdf:type ja:RDFDataset ;
>> ja:defaultGraph
>> [
>> a ja:MemoryModel ;
>> ] .
>> Text
>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>> :text_dataset a text:TextDataset ;
>> text:dataset <#dataset> ;
>> text:index <#textIndexLucene> ;
>> .
>> Text index description
>> <#textIndexLucene> a text:TextIndexLucene ;
>> text:directory <file:Lucene> <file://Lucene> ;
>> ##text:directory "mem" ;
>> text:entityMap <#entMap> ;
>> .
>> <#entMap> a text:EntityMap ;
>> text:entityField "uri" ;
>> text:defaultField "text" ;
>> text:map (
>> [ text:field "text" ; text:predicate rdfs:label ]
>> ) .
>
> The server is started with
>> "./fuseki-server --config=config-memory-text.ttl"
>
> and console says it starts properly:
>> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000
>> [2015-06-03 12:13:09] Config INFO FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
>> [2015-06-03 12:13:09] Config INFO FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
>> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment
>> [2015-06-03 12:13:09] Config INFO Shiro file: file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
>> [2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
>> [2015-06-03 12:13:10] Builder INFO Service: :memory
>> [2015-06-03 12:13:11] Config INFO Register: /memory
>> [2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 3030
>
> I tested it in two versions: the official release 2.0.0 and the latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:
>
> In 2.0.0:
> If I load some triples not containing "rdfs:label", everything works properly. However in this case the index engine is not working; then as long as I add one triple for "rdfs:label" into the file I am loading to Fuseki, error emerges:
>> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 Triples=40 Quads=0
>> [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation attempts to continue): Can't abort a write lock-transaction
>> [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)
>
> I remember that a few months ago when 2.0.0 was released for the first time, I discovered this issue and reported to you. But at that time I didn't realize that the root reason was because of indexing. In a later snapshot you fix it, but my test wasn't proper so I thought the problem is solved and gave you a wrong feedback. My sincere apologizes.
>
> In 2.0.1 SNAPSHOT:
> The latest snapshot contains the patch I mentioned above so they can be successfully loaded. However they are not indexed at all. Queries with keyword search do not return any result.
> Following your advice, I tested loading and query from both Web UI and s-post/s-query tools, unfortunately (or fortunately?) the consequences are the same.
>
> TDB:
> Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT is also performed, they both works properly. Loadings are successful and queries returns search results. The only difference is in the configuration file the in-memory dataset is replaced with TDB.
>> @prefix : <#> .
>> @prefix fuseki: <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> .
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> .
>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> .
>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> .
>> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
>> [] rdf:type fuseki:Server ;
>> fuseki:services (
>> <#service_text_tdb>
>> ) .
>> TDB
>> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
>> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
>> tdb:GraphTDB rdfs:subClassOf ja:Model .
>> Text
>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>> text:TextDataset rdfs:subClassOf ja:RDFDataset .
>> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
>> <#service_text_tdb> a fuseki:Service ;
>> rdfs:label "TDB/text service" ;
>> fuseki:name "tdb" ;
>> fuseki:serviceQuery "query" ;
>> fuseki:serviceQuery "sparql" ;
>> fuseki:serviceUpdate "update" ;
>> fuseki:serviceUpload "upload" ;
>> fuseki:serviceReadGraphStore "get" ;
>> fuseki:serviceReadWriteGraphStore "data" ;
>> fuseki:dataset <#text_dataset> ;
>> .
>> <#text_dataset> a text:TextDataset ;
>> text:dataset <#dataset> ;
>> text:index <#indexLucene> ;
>> .
>> <#dataset> a tdb:DatasetTDB ;
>> tdb:location "DB" ;
>> ##tdb:unionDefaultGraph true ;
>> .
>> <#indexLucene> a text:TextIndexLucene ;
>> text:directory <file:Lucene> <file://Lucene> ;
>> ##text:directory "mem" ;
>> text:entityMap <#entMap> ;
>> .
>> <#entMap> a text:EntityMap ;
>> text:entityField "uri" ;
>> text:defaultField "text" ;
>> text:map (
>> [ text:field "text" ; text:predicate rdfs:label ]
>> ) .
>
> Any advice for it now? Thank you very much for your efforts in advance.
>
> Regards,
> Yang
>
> PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as well. However I wasn't able to run it.
>
> On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:
>> Hi Andy,
>>
>> Thank you very much for your reply.
>>
>> In fact the problem is irrelevant to the preloaded triples. It won't work no matter if we start an empty or preloaded one. Moreover, it takes around 1 minute to load 38k triples, while TDB only needs 6 seconds. If we turn off text search for an in-memory dataset, the loading speed rushed to only 1 second. That's why I thought problem is from Fuseki side.
>>
>> As for TDB with reasoning, I don't agree with your opinion that the dataset is not attached to a text index. We have defined the dataset:
>>> <#tdb_inf_ds> a ja:RDFDataset ;
>>>      ja:defaultGraph       <#tdb_inf> ;
>>>      .
>> We tell Lucene to index it:
>>> :text_dataset a text:TextDataset ;
>>>      text:dataset   <#tdb_inf_ds> ;
>>>      text:index     <#textIndexLucene> ;
>>>      .
>> And we assert that the dataset includes an RDFS inference model:
>>> <#tdb_inf> a ja:InfModel ;
>>>      rdfs:label "RDFS Inference Model" ;
>>>      ja:baseModel <#tdb_graph> ;
>>>      ja:reasoner
>>>           [ ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
>>>      .
>>
>> Then both text search and RDFS reasoning should work. Such configuration works properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. I don't know what I should do to adjust to the new system.
>>
>> Thank you very much for your efforts again and have a nice day.
>>
>> Regards,
>> Yang
>>
>>
>> On 04/17/2015 02:53 PM, Andy Seaborne wrote:
>>> On 14/04/15 18:51, Yang Yuanzhe wrote:
>>>> Hi there,
>>>>
>>>> Sorry to trouble you again. Last month I wrote to you to figure out the
>>>> bug in text search for TDB. Given the following configuration, text
>>>> search works with TDB:
>>>>
>>> ...
>>>
>>> Comments inline:
>>>
>>>> Now we want to use text search for in-memory datasets, but we failed
>>>> after some trials, the configuration file we use is as follows:
>>>>
>>>>> @prefix :        <#> .
>>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> .
>>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> .
>>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> .
>>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>> @prefix text:    <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
>>>>> @prefix spatial:    <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> .
>>>>>
>>>>> [] a fuseki:Server ;
>>>>>     fuseki:services (
>>>>>       <#memory>
>>>>>     ) .
>>>>>
>>>>> <#memory> a fuseki:Service ;
>>>>>      fuseki:name                     "memory" ;
>>>>>      fuseki:serviceQuery             "sparql" ;
>>>>>      fuseki:serviceQuery             "query" ;
>>>>>      fuseki:serviceUpdate            "update" ;   # SPARQL query
>>>>> service -- /memory/update
>>>>>      fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
>>>>> service
>>>>>      fuseki:serviceReadWriteGraphStore      "data" ;
>>>>>      fuseki:serviceReadGraphStore       "get" ;   # Graph store
>>>>> protocol (read only) -- /memory/get
>>>>>      fuseki:dataset           :text_dataset ;
>>>>>      .
>>>>>
>>>>> <#dataset> rdf:type ja:RDFDataset ;
>>>>>      ja:defaultGraph
>>>>>            [
>>>>>              a ja:MemoryModel ;
>>>>>              ja:content [ja:externalContent <file:dcat-vl.ttl> <file://dcat-vl.ttl> ] ;
>>>>>            ] .
>>>
>>> That is going to load the data each time the server starts but does not attach it anyway to the text index.
>>>
>>> Is it the same data as is loaded (separately) into the text index?
>>>
>>> Similarly for the inference setup (which is in a different Lucene index file:Text <file://Text>) ...
>>>
>>>      Andy
>>>
>>>>>
>>>>> # Text
>>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>>
>>>>> :text_dataset a text:TextDataset ;
>>>>>      text:dataset   <#dataset> ;
>>>>>      text:index     <#textIndexLucene> ;
>>>>>      .
>>>>>
>>>>> # Text index description
>>>>> <#textIndexLucene> a text:TextIndexLucene ;
>>>>>      text:directory <file:Lucene> <file://Lucene> ;
>>>>>      ##text:directory "mem" ;
>>>>>      text:entityMap <#entMap> ;
>>>>>      .
>>>>>
>>>>> <#entMap> a text:EntityMap ;
>>>>>      text:entityField      "uri" ;
>>>>>      text:defaultField     "text" ;
>>>>>      text:map (
>>>>>           [ text:field "text" ; text:predicate rdfs:label ]
>>>>>           ) .
>>>>>
>>> ...
>>>
>>>>
>>>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
>>>> clue or any suggestion for this issue? Thank you very much and have a
>>>> nice day.
>>>>
>>>> Regards,
>>>> Yang
>>>>
>>>
>>
>
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Yang <ya...@mail.proxml.be>.

Hi Andy,

I am sorry for such a late response. We were busy on another project during this period. Now I try to explain how I reproduce the error step by step. I did send you an email to the mailing list yesterday, however it never shows up. So I would like to give another trial today. My apologies for possible duplicates.

So the problem is there is something wrong in the search indexing for in-memory datasets.
Here is the configuration file I used, it should be basic enough: a server description, a service description and an index engine associating to the dataset to index "rdfs:label".

> @prefix : <#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
> @prefix spatial: <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> .
> [] a fuseki:Server ;
> fuseki:services (
> <#memory>
> ) .
> <#memory> a fuseki:Service ;
> fuseki:name "memory" ; 
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceUpdate "update" ; # SPARQL query service – /memory/update
> fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
> fuseki:serviceReadWriteGraphStore "data" ; 
> fuseki:serviceReadGraphStore "get" ; # Graph store protocol (read only) – /memory/get
> fuseki:dataset :text_dataset ;
> .
> <#dataset> rdf:type ja:RDFDataset ;
> ja:defaultGraph
> [ 
> a ja:MemoryModel ;
> ] .
> Text
> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
> text:TextDataset rdfs:subClassOf ja:RDFDataset .
> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
> :text_dataset a text:TextDataset ;
> text:dataset <#dataset> ;
> text:index <#textIndexLucene> ;
> .
> Text index description
> <#textIndexLucene> a text:TextIndexLucene ;
> text:directory <file:Lucene> <file://Lucene> ;
> ##text:directory "mem" ;
> text:entityMap <#entMap> ;
> .
> <#entMap> a text:EntityMap ;
> text:entityField "uri" ;
> text:defaultField "text" ;
> text:map (
> [ text:field "text" ; text:predicate rdfs:label ]
> ) .

The server is started with
> "./fuseki-server --config=config-memory-text.ttl"

and console says it starts properly:
> [2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000
> [2015-06-03 12:13:09] Config INFO FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
> [2015-06-03 12:13:09] Config INFO FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
> [2015-06-03 12:13:09] Servlet INFO Initializing Shiro environment
> [2015-06-03 12:13:09] Config INFO Shiro file: file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini <file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>
> [2015-06-03 12:13:09] Config INFO Configuration file: config-memory-text.ttl
> [2015-06-03 12:13:10] Builder INFO Service: :memory
> [2015-06-03 12:13:11] Config INFO Register: /memory
> [2015-06-03 12:13:11] Server INFO Started 2015/06/03 12:13:11 CEST on port 3030

I tested it in two versions: the official release 2.0.0 and the latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:

In 2.0.0:
If I load some triples not containing "rdfs:label", everything works properly. However in this case the index engine is not working; then as long as I add one triple for "rdfs:label" into the file I am loading to Fuseki, error emerges:
> [2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 Triples=40 Quads=0
> [2015-06-03 12:10:47] HttpAction WARN Exception during abort (operation attempts to continue): Can't abort a write lock-transaction
> [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server Error (523 ms)

I remember that a few months ago when 2.0.0 was released for the first time, I discovered this issue and reported to you. But at that time I didn't realize that the root reason was because of indexing. In a later snapshot you fix it, but my test wasn't proper so I thought the problem is solved and gave you a wrong feedback. My sincere apologizes.

In 2.0.1 SNAPSHOT:
The latest snapshot contains the patch I mentioned above so they can be successfully loaded. However they are not indexed at all. Queries with keyword search do not return any result.
Following your advice, I tested loading and query from both Web UI and s-post/s-query tools, unfortunately (or fortunately?) the consequences are the same.

TDB:
Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT is also performed, they both works properly. Loadings are successful and queries returns search results. The only difference is in the configuration file the in-memory dataset is replaced with TDB.
> @prefix : <#> .
> @prefix fuseki: <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text: <http://jena.apache.org/text#> <http://jena.apache.org/text#> .
> [] rdf:type fuseki:Server ;
> fuseki:services (
> <#service_text_tdb>
> ) .
> TDB
> [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
> tdb:GraphTDB rdfs:subClassOf ja:Model .
> Text
> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
> text:TextDataset rdfs:subClassOf ja:RDFDataset .
> text:TextIndexLucene rdfs:subClassOf text:TextIndex .
> <#service_text_tdb> a fuseki:Service ;
> rdfs:label "TDB/text service" ;
> fuseki:name "tdb" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset <#text_dataset> ;
> .
> <#text_dataset> a text:TextDataset ;
> text:dataset <#dataset> ;
> text:index <#indexLucene> ;
> .
> <#dataset> a tdb:DatasetTDB ;
> tdb:location "DB" ;
> ##tdb:unionDefaultGraph true ;
> .
> <#indexLucene> a text:TextIndexLucene ;
> text:directory <file:Lucene> <file://Lucene> ;
> ##text:directory "mem" ;
> text:entityMap <#entMap> ;
> .
> <#entMap> a text:EntityMap ;
> text:entityField "uri" ;
> text:defaultField "text" ; 
> text:map ( 
> [ text:field "text" ; text:predicate rdfs:label ]
> ) .

Any advice for it now? Thank you very much for your efforts in advance.

Regards,
Yang

PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as well. However I wasn't able to run it.

On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:
> Hi Andy, 
> 
> Thank you very much for your reply. 
> 
> In fact the problem is irrelevant to the preloaded triples. It won't work no matter if we start an empty or preloaded one. Moreover, it takes around 1 minute to load 38k triples, while TDB only needs 6 seconds. If we turn off text search for an in-memory dataset, the loading speed rushed to only 1 second. That's why I thought problem is from Fuseki side. 
> 
> As for TDB with reasoning, I don't agree with your opinion that the dataset is not attached to a text index. We have defined the dataset: 
>> <#tdb_inf_ds> a ja:RDFDataset ; 
>>     ja:defaultGraph       <#tdb_inf> ; 
>>     . 
> We tell Lucene to index it: 
>> :text_dataset a text:TextDataset ; 
>>     text:dataset   <#tdb_inf_ds> ; 
>>     text:index     <#textIndexLucene> ; 
>>     .
> And we assert that the dataset includes an RDFS inference model: 
>> <#tdb_inf> a ja:InfModel ; 
>>     rdfs:label "RDFS Inference Model" ; 
>>     ja:baseModel <#tdb_graph> ; 
>>     ja:reasoner 
>>          [ ja:reasonerURL <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] 
>>     .
> 
> Then both text search and RDFS reasoning should work. Such configuration works properly in Fuseki 1.1.1. However things changed in 1.1.2 and 2.0.x. I don't know what I should do to adjust to the new system. 
> 
> Thank you very much for your efforts again and have a nice day. 
> 
> Regards, 
> Yang 
> 
> 
> On 04/17/2015 02:53 PM, Andy Seaborne wrote: 
>> On 14/04/15 18:51, Yang Yuanzhe wrote: 
>>> Hi there, 
>>> 
>>> Sorry to trouble you again. Last month I wrote to you to figure out the 
>>> bug in text search for TDB. Given the following configuration, text 
>>> search works with TDB: 
>>> 
>> ... 
>> 
>> Comments inline: 
>> 
>>> Now we want to use text search for in-memory datasets, but we failed 
>>> after some trials, the configuration file we use is as follows: 
>>> 
>>>> @prefix :        <#> . 
>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> <http://jena.apache.org/fuseki#> . 
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
>>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> <http://www.w3.org/2000/01/rdf-schema#> . 
>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> <http://jena.hpl.hp.com/2008/tdb#> . 
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> <http://jena.hpl.hp.com/2005/11/Assembler#> . 
>>>> @prefix text:    <http://jena.apache.org/text#> <http://jena.apache.org/text#> . 
>>>> @prefix spatial:    <http://jena.apache.org/spatial#> <http://jena.apache.org/spatial#> . 
>>>> 
>>>> [] a fuseki:Server ; 
>>>>    fuseki:services ( 
>>>>      <#memory> 
>>>>    ) . 
>>>> 
>>>> <#memory> a fuseki:Service ; 
>>>>     fuseki:name                     "memory" ; 
>>>>     fuseki:serviceQuery             "sparql" ; 
>>>>     fuseki:serviceQuery             "query" ; 
>>>>     fuseki:serviceUpdate            "update" ;   # SPARQL query 
>>>> service -- /memory/update 
>>>>     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload 
>>>> service 
>>>>     fuseki:serviceReadWriteGraphStore      "data" ; 
>>>>     fuseki:serviceReadGraphStore       "get" ;   # Graph store 
>>>> protocol (read only) -- /memory/get 
>>>>     fuseki:dataset           :text_dataset ; 
>>>>     . 
>>>> 
>>>> <#dataset> rdf:type ja:RDFDataset ; 
>>>>     ja:defaultGraph 
>>>>           [ 
>>>>             a ja:MemoryModel ; 
>>>>             ja:content [ja:externalContent <file:dcat-vl.ttl> <file://dcat-vl.ttl> ] ; 
>>>>           ] . 
>> 
>> That is going to load the data each time the server starts but does not attach it anyway to the text index. 
>> 
>> Is it the same data as is loaded (separately) into the text index? 
>> 
>> Similarly for the inference setup (which is in a different Lucene index file:Text <file://Text>) ... 
>> 
>>     Andy 
>> 
>>>> 
>>>> # Text 
>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" . 
>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset . 
>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex . 
>>>> 
>>>> :text_dataset a text:TextDataset ; 
>>>>     text:dataset   <#dataset> ; 
>>>>     text:index     <#textIndexLucene> ; 
>>>>     . 
>>>> 
>>>> # Text index description 
>>>> <#textIndexLucene> a text:TextIndexLucene ; 
>>>>     text:directory <file:Lucene> <file://Lucene> ; 
>>>>     ##text:directory "mem" ; 
>>>>     text:entityMap <#entMap> ; 
>>>>     . 
>>>> 
>>>> <#entMap> a text:EntityMap ; 
>>>>     text:entityField      "uri" ; 
>>>>     text:defaultField     "text" ; 
>>>>     text:map ( 
>>>>          [ text:field "text" ; text:predicate rdfs:label ] 
>>>>          ) . 
>>>> 
>> ... 
>> 
>>> 
>>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any 
>>> clue or any suggestion for this issue? Thank you very much and have a 
>>> nice day. 
>>> 
>>> Regards, 
>>> Yang 
>>> 
>> 
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Andy Seaborne <an...@apache.org>.

On 17/04/15 16:29, Yang Yuanzhe wrote:
> Hi Andy,
>
> Thank you very much for your reply.
>
> In fact the problem is irrelevant to the preloaded triples. It won't
> work no matter if we start an empty or preloaded one. Moreover, it takes
> around 1 minute to load 38k triples, while TDB only needs 6 seconds. If
> we turn off text search for an in-memory dataset, the loading speed
> rushed to only 1 second. That's why I thought problem is from Fuseki side.
>
> As for TDB with reasoning, I don't agree with your opinion that the
> dataset is not attached to a text index.

In your configuration, I can see no loading of the test index which is a 
file based index.

[[
<#dataset> rdf:type ja:RDFDataset ;
     ja:defaultGraph
           [
             a ja:MemoryModel ;
             ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
           ] .
]]

does not put any information into the text index; it finds the default 
graph of the underlying dataset, not the text dataset, and loads the 
file.  At this point, the text index has not been touched.

The current description is useful but isn't enough for me to reproduce 
the situation.

Please could you provide a complete, minimal example for just the text 
indexing case?

i.e. Something I can use at my end without having to do anything not 
described.

If it is changes between 1.1.1 and 1.1.2, lets' stick to those two 
versions. For such as system:

1/ A configuration, as short as possible to illustrate the situation.
Ideally, in-memory, including the text index, is cleaner because then 
our tests are repeated each time the exampel is run.

2/ How to start the server

3/ Actions needed to load data

Using the s-put, s-post etc tools or wget/curl to load the data if it 
comes from the web side; a small datafile if preloaded when the server 
starts.

4/ The query being made.
    What happens?
    Is failed an error status code or silence?

	Andy

> We have defined the dataset:
>> <#tdb_inf_ds> a ja:RDFDataset ;
>>     ja:defaultGraph       <#tdb_inf> ;
>>     .
> We tell Lucene to index it:
>> :text_dataset a text:TextDataset ;
>>     text:dataset   <#tdb_inf_ds> ;
>>     text:index     <#textIndexLucene> ;
>>     .
> And we assert that the dataset includes an RDFS inference model:
>> <#tdb_inf> a ja:InfModel ;
>>     rdfs:label "RDFS Inference Model" ;
>>     ja:baseModel <#tdb_graph> ;
>>     ja:reasoner
>>          [ ja:reasonerURL
>> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
>>     .
>
> Then both text search and RDFS reasoning should work. Such configuration
> works properly in Fuseki 1.1.1. However things changed in 1.1.2 and
> 2.0.x. I don't know what I should do to adjust to the new system.
>
> Thank you very much for your efforts again and have a nice day.
>
> Regards,
> Yang
>
>
> On 04/17/2015 02:53 PM, Andy Seaborne wrote:
>> On 14/04/15 18:51, Yang Yuanzhe wrote:
>>> Hi there,
>>>
>>> Sorry to trouble you again. Last month I wrote to you to figure out the
>>> bug in text search for TDB. Given the following configuration, text
>>> search works with TDB:
>>>
>> ...
>>
>> Comments inline:
>>
>>> Now we want to use text search for in-memory datasets, but we failed
>>> after some trials, the configuration file we use is as follows:
>>>
>>>> @prefix :        <#> .
>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>> @prefix text:    <http://jena.apache.org/text#> .
>>>> @prefix spatial:    <http://jena.apache.org/spatial#> .
>>>>
>>>> [] a fuseki:Server ;
>>>>    fuseki:services (
>>>>      <#memory>
>>>>    ) .
>>>>
>>>> <#memory> a fuseki:Service ;
>>>>     fuseki:name                     "memory" ;
>>>>     fuseki:serviceQuery             "sparql" ;
>>>>     fuseki:serviceQuery             "query" ;
>>>>     fuseki:serviceUpdate            "update" ;   # SPARQL query
>>>> service -- /memory/update
>>>>     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
>>>> service
>>>>     fuseki:serviceReadWriteGraphStore      "data" ;
>>>>     fuseki:serviceReadGraphStore       "get" ;   # Graph store
>>>> protocol (read only) -- /memory/get
>>>>     fuseki:dataset           :text_dataset ;
>>>>     .
>>>>
>>>> <#dataset> rdf:type ja:RDFDataset ;
>>>>     ja:defaultGraph
>>>>           [
>>>>             a ja:MemoryModel ;
>>>>             ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
>>>>           ] .
>>
>> That is going to load the data each time the server starts but does
>> not attach it anyway to the text index.
>>
>> Is it the same data as is loaded (separately) into the text index?
>>
>> Similarly for the inference setup (which is in a different Lucene
>> index file:Text) ...
>>
>>     Andy
>>
>>>>
>>>> # Text
>>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>>
>>>> :text_dataset a text:TextDataset ;
>>>>     text:dataset   <#dataset> ;
>>>>     text:index     <#textIndexLucene> ;
>>>>     .
>>>>
>>>> # Text index description
>>>> <#textIndexLucene> a text:TextIndexLucene ;
>>>>     text:directory <file:Lucene> ;
>>>>     ##text:directory "mem" ;
>>>>     text:entityMap <#entMap> ;
>>>>     .
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>>     text:entityField      "uri" ;
>>>>     text:defaultField     "text" ;
>>>>     text:map (
>>>>          [ text:field "text" ; text:predicate rdfs:label ]
>>>>          ) .
>>>>
>> ...
>>
>>>
>>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
>>> clue or any suggestion for this issue? Thank you very much and have a
>>> nice day.
>>>
>>> Regards,
>>> Yang
>>>
>>
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Yang Yuanzhe <ya...@proxml.be>.

Hi Andy,

Thank you very much for your reply.

In fact the problem is irrelevant to the preloaded triples. It won't 
work no matter if we start an empty or preloaded one. Moreover, it takes 
around 1 minute to load 38k triples, while TDB only needs 6 seconds. If 
we turn off text search for an in-memory dataset, the loading speed 
rushed to only 1 second. That's why I thought problem is from Fuseki side.

As for TDB with reasoning, I don't agree with your opinion that the 
dataset is not attached to a text index. We have defined the dataset:
> <#tdb_inf_ds> a ja:RDFDataset ;
>     ja:defaultGraph       <#tdb_inf> ;
>     .
We tell Lucene to index it:
> :text_dataset a text:TextDataset ;
>     text:dataset   <#tdb_inf_ds> ;
>     text:index     <#textIndexLucene> ;
>     . 
And we assert that the dataset includes an RDFS inference model:
> <#tdb_inf> a ja:InfModel ;
>     rdfs:label "RDFS Inference Model" ;
>     ja:baseModel <#tdb_graph> ;
>     ja:reasoner
>          [ ja:reasonerURL 
> <http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
>     . 

Then both text search and RDFS reasoning should work. Such configuration 
works properly in Fuseki 1.1.1. However things changed in 1.1.2 and 
2.0.x. I don't know what I should do to adjust to the new system.

Thank you very much for your efforts again and have a nice day.

Regards,
Yang


On 04/17/2015 02:53 PM, Andy Seaborne wrote:
> On 14/04/15 18:51, Yang Yuanzhe wrote:
>> Hi there,
>>
>> Sorry to trouble you again. Last month I wrote to you to figure out the
>> bug in text search for TDB. Given the following configuration, text
>> search works with TDB:
>>
> ...
>
> Comments inline:
>
>> Now we want to use text search for in-memory datasets, but we failed
>> after some trials, the configuration file we use is as follows:
>>
>>> @prefix :        <#> .
>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>> @prefix text:    <http://jena.apache.org/text#> .
>>> @prefix spatial:    <http://jena.apache.org/spatial#> .
>>>
>>> [] a fuseki:Server ;
>>>    fuseki:services (
>>>      <#memory>
>>>    ) .
>>>
>>> <#memory> a fuseki:Service ;
>>>     fuseki:name                     "memory" ;
>>>     fuseki:serviceQuery             "sparql" ;
>>>     fuseki:serviceQuery             "query" ;
>>>     fuseki:serviceUpdate            "update" ;   # SPARQL query
>>> service -- /memory/update
>>>     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
>>> service
>>>     fuseki:serviceReadWriteGraphStore      "data" ;
>>>     fuseki:serviceReadGraphStore       "get" ;   # Graph store
>>> protocol (read only) -- /memory/get
>>>     fuseki:dataset           :text_dataset ;
>>>     .
>>>
>>> <#dataset> rdf:type ja:RDFDataset ;
>>>     ja:defaultGraph
>>>           [
>>>             a ja:MemoryModel ;
>>>             ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
>>>           ] .
>
> That is going to load the data each time the server starts but does 
> not attach it anyway to the text index.
>
> Is it the same data as is loaded (separately) into the text index?
>
> Similarly for the inference setup (which is in a different Lucene 
> index file:Text) ...
>
>     Andy
>
>>>
>>> # Text
>>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>>
>>> :text_dataset a text:TextDataset ;
>>>     text:dataset   <#dataset> ;
>>>     text:index     <#textIndexLucene> ;
>>>     .
>>>
>>> # Text index description
>>> <#textIndexLucene> a text:TextIndexLucene ;
>>>     text:directory <file:Lucene> ;
>>>     ##text:directory "mem" ;
>>>     text:entityMap <#entMap> ;
>>>     .
>>>
>>> <#entMap> a text:EntityMap ;
>>>     text:entityField      "uri" ;
>>>     text:defaultField     "text" ;
>>>     text:map (
>>>          [ text:field "text" ; text:predicate rdfs:label ]
>>>          ) .
>>>
> ...
>
>>
>> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
>> clue or any suggestion for this issue? Thank you very much and have a
>> nice day.
>>
>> Regards,
>> Yang
>>
>

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Posted by Andy Seaborne <an...@apache.org>.

On 14/04/15 18:51, Yang Yuanzhe wrote:
> Hi there,
>
> Sorry to trouble you again. Last month I wrote to you to figure out the
> bug in text search for TDB. Given the following configuration, text
> search works with TDB:
>
...

Comments inline:

> Now we want to use text search for in-memory datasets, but we failed
> after some trials, the configuration file we use is as follows:
>
>> @prefix :        <#> .
>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>> @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
>> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
>> @prefix text:    <http://jena.apache.org/text#> .
>> @prefix spatial:    <http://jena.apache.org/spatial#> .
>>
>> [] a fuseki:Server ;
>>    fuseki:services (
>>      <#memory>
>>    ) .
>>
>> <#memory> a fuseki:Service ;
>>     fuseki:name                     "memory" ;
>>     fuseki:serviceQuery             "sparql" ;
>>     fuseki:serviceQuery             "query" ;
>>     fuseki:serviceUpdate            "update" ;   # SPARQL query
>> service -- /memory/update
>>     fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
>> service
>>     fuseki:serviceReadWriteGraphStore      "data" ;
>>     fuseki:serviceReadGraphStore       "get" ;   # Graph store
>> protocol (read only) -- /memory/get
>>     fuseki:dataset           :text_dataset ;
>>     .
>>
>> <#dataset> rdf:type ja:RDFDataset ;
>>     ja:defaultGraph
>>           [
>>             a ja:MemoryModel ;
>>             ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
>>           ] .

That is going to load the data each time the server starts but does not 
attach it anyway to the text index.

Is it the same data as is loaded (separately) into the text index?

Similarly for the inference setup (which is in a different Lucene index 
file:Text) ...

	Andy

>>
>> # Text
>> [] ja:loadClass "org.apache.jena.query.text.TextQuery" .
>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>>
>> :text_dataset a text:TextDataset ;
>>     text:dataset   <#dataset> ;
>>     text:index     <#textIndexLucene> ;
>>     .
>>
>> # Text index description
>> <#textIndexLucene> a text:TextIndexLucene ;
>>     text:directory <file:Lucene> ;
>>     ##text:directory "mem" ;
>>     text:entityMap <#entMap> ;
>>     .
>>
>> <#entMap> a text:EntityMap ;
>>     text:entityField      "uri" ;
>>     text:defaultField     "text" ;
>>     text:map (
>>          [ text:field "text" ; text:predicate rdfs:label ]
>>          ) .
>>
...

>
> All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
> clue or any suggestion for this issue? Thank you very much and have a
> nice day.
>
> Regards,
> Yang
>