You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Yang Yuanzhe (JIRA)" <ji...@apache.org> on 2015/06/04 10:35:39 UTC

[jira] [Created] (JENA-953) Text search does not work in Fuseki with In-memory datasets

Yang Yuanzhe created JENA-953:
---------------------------------

             Summary: Text search does not work in Fuseki with In-memory datasets
                 Key: JENA-953
                 URL: https://issues.apache.org/jira/browse/JENA-953
             Project: Apache Jena
          Issue Type: Bug
          Components: Fuseki, Text
    Affects Versions: Fuseki 2.0.0, Fuseki 1.1.2, Fuseki 2.0.1
         Environment: Ubuntu 14.04 in VM
            Reporter: Yang Yuanzhe


First of all I apologize for possible duplicate posts. I sent it to the mailing list, it disappeared from the "draft box" but didn't show up again in the "sent box" either. So I try to publish it here before I lost it from my clipboard. :D

Here is the copy of the mail:

Hi Andy,

I am sorry for such a late response. We were busy on another project during this period. Now I try to explain how I reproduce the error step by step.

So the problem is there is something wrong in the search indexing for in-memory datasets.

Here is the configuration file I used, it should be basic enough: a server description, a service description and an index engine associating to the dataset to index "rdfs:label".

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix spatial:    <http://jena.apache.org/spatial#> .

[] a fuseki:Server ;
   fuseki:services (
     <#memory>
   ) .

<#memory> a fuseki:Service ;
    fuseki:name                     "memory" ; 
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceUpdate            "update" ;   # SPARQL query service -- /memory/update
    fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore      "data" ;     
    fuseki:serviceReadGraphStore       "get" ;   # Graph store protocol (read only) -- /memory/get
    fuseki:dataset           :text_dataset ;
    .

<#dataset> rdf:type ja:RDFDataset ;
    ja:defaultGraph
          [ 
            a ja:MemoryModel ;
          ] .

# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

:text_dataset a text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#textIndexLucene> ;
    .

# Text index description
<#textIndexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate rdfs:label ]
         ) .


The server is started with
"./fuseki-server --config=config-memory-text.ttl"
and console says it starts properly:
[2015-06-03 12:13:09] Server     INFO  Fuseki 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000
[2015-06-03 12:13:09] Config     INFO  FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
[2015-06-03 12:13:09] Config     INFO  FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
[2015-06-03 12:13:09] Servlet    INFO  Initializing Shiro environment
[2015-06-03 12:13:09] Config     INFO  Shiro file: file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini
[2015-06-03 12:13:09] Config     INFO  Configuration file: config-memory-text.ttl
[2015-06-03 12:13:10] Builder    INFO  Service: :memory
[2015-06-03 12:13:11] Config     INFO  Register: /memory
[2015-06-03 12:13:11] Server     INFO  Started 2015/06/03 12:13:11 CEST on port 3030

I tested it in two versions: the official release 2.0.0 and the latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:

In 2.0.0:
If I load some triples not containing "rdfs:label", everything works properly. However in this case the index engine is not working; then as long as I add one triple for "rdfs:label" into the file I am loading to Fuseki, error emerges:
[2015-06-03 12:10:47] Fuseki     INFO  [7] Filename: licenties.ttl, Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 Triples=40 Quads=0
[2015-06-03 12:10:47] HttpAction WARN  Exception during abort (operation attempts to continue): Can't abort a write lock-transaction
[2015-06-03 12:10:47] Fuseki     INFO  [7] 500 Server Error (523 ms) 
I remember that a few months ago when 2.0.0 was released for the first time, I discovered this issue and reported to you. But at that time I didn't realize that the root reason was because of indexing. In a later snapshot you fix it, but my test wasn't proper so I thought the problem is solved and gave you a wrong feedback. My sincere apologizes.

In 2.0.1 SNAPSHOT:
The latest snapshot contains the patch I mentioned above so they can be successfully loaded. However they are not indexed at all. Queries with keyword search do not return any result.

Following your advice, I tested loading and query from both Web UI and s-post/s-query tools, unfortunately (or fortunately?) the consequences are the same.

TDB:
Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT is also performed, they both works properly. Loadings are successful and queries returns search results. The only difference is in the configuration file the in-memory dataset is replaced with TDB.
@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .

[] rdf:type fuseki:Server ;
   fuseki:services (
     <#service_text_tdb>
   ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

<#service_text_tdb> a fuseki:Service ;
    rdfs:label                      "TDB/text service" ;
    fuseki:name                     "tdb" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  <#text_dataset> ;
    .

<#text_dataset> a text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#indexLucene> ;
    .

<#dataset> a tdb:DatasetTDB ;
    tdb:location "DB" ;
    ##tdb:unionDefaultGraph true ;
    .

<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;      
    text:map (          
         [ text:field "text" ; text:predicate rdfs:label ]
         ) .

Any advice for it now? Thank you very much for your efforts in advance.

Regards,
Yang

PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as well. However I wasn't able to run it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)