You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Kumar,Abhishek" <ab...@ufl.edu> on 2016/11/28 23:31:03 UTC

Text Search Using Lucene

Hi,


I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.


But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.


What I have tried so far

1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file

2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file

3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl


I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results


but using --desc gives error no service name.


Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.


Can someone please help here?


Thanks & Regards

Abhishek Kumar

Re: Text Search Using Lucene

Posted by Osma Suominen <os...@helsinki.fi>.

Sorry I misinterpreted the StackOverflow link, so please ignore the part 
about the version. I'm assuming you are using a recent Fuseki version.

-Osma

29.11.2016, 10:55, Osma Suominen kirjoitti:
> Hi Abhishek,
>
> What are the contents of the Lucene index directory (called "Lucene"
> according to your configuration) after the text indexing operation?
>
> I.e. is the directory
> - nonexistent or completely empty?
> - with a few empty or very small (up to a few kilobytes) files?
> - with real index files of several megabytes?
>
> You mention on StackOverflow that you are using Fuseki 2.0.0. That is a
> rather old version, could you upgrade to something newer? I'm not sure
> about which version of jena-text was included in 2.0.0 but it must be
> old and I'm unsure about the issues it may have.
>
> -Osma
>
> 29.11.2016, 01:31, Kumar,Abhishek kirjoitti:
>> Hi,
>>
>>
>> I am trying to implement text search in Jena via Fuseki. I have
>> followed through the documentation and created assembler file.
>>
>>
>> But after starting fuseki server using config parameter - there is no
>> data in the dataset and thus returns no results for simple query or
>> text query.
>>
>>
>> What I have tried so far
>>
>> 1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar
>> tdb.tdbloader --tdb=assembler_file data_file
>>
>> 2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar
>> jena.textindexer --desc=assembler_file
>>
>> 3. Started the fuseki server using fuseki-server --config
>> ../assembler_file.ttl
>>
>>
>> I tried the answer on Stackoverflow
>> http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results
>>
>>
>>
>> but using --desc gives error no service name.
>>
>>
>> Another user had similar issue a year ago as in this thread
>> http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are
>> no solutions there as well.
>>
>>
>> Can someone please help here?
>>
>>
>> Thanks & Regards
>>
>> Abhishek Kumar
>>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Text Search Using Lucene

Posted by Osma Suominen <os...@helsinki.fi>.

Hi Abhishek,

What are the contents of the Lucene index directory (called "Lucene" 
according to your configuration) after the text indexing operation?

I.e. is the directory
- nonexistent or completely empty?
- with a few empty or very small (up to a few kilobytes) files?
- with real index files of several megabytes?

You mention on StackOverflow that you are using Fuseki 2.0.0. That is a 
rather old version, could you upgrade to something newer? I'm not sure 
about which version of jena-text was included in 2.0.0 but it must be 
old and I'm unsure about the issues it may have.

-Osma

29.11.2016, 01:31, Kumar,Abhishek kirjoitti:
> Hi,
>
>
> I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.
>
>
> But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.
>
>
> What I have tried so far
>
> 1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file
>
> 2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file
>
> 3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl
>
>
> I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results
>
>
> but using --desc gives error no service name.
>
>
> Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.
>
>
> Can someone please help here?
>
>
> Thanks & Regards
>
> Abhishek Kumar
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Text Search Using Lucene

Posted by Rob Vesse <rv...@dotnetrdf.org>.

One possible problem is that both your database and text index locations are given as relative paths. So depending on where on your system you run commands from you can get completely different results. I would strongly recommend using absolute paths if possible.

 Rob

On 28/11/2016 23:39, "Kumar,Abhishek" <ab...@ufl.edu> wrote:

    This is my config file
    
    
    @prefix :        <http://localhost/jena_example/#> .
    
    @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    
    @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
    
    @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
    
    @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
    
    @prefix text:    <http://jena.apache.org/text#> .
    
    @prefix fuseki:  <http://jena.apache.org/fuseki#> .
    
    
    ## Example of a TDB dataset and text index
    
    ## Initialize TDB
    
    [] ja:loadClass "org.apache.jena.tdb.TDB" .
    
    tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
    
    tdb:GraphTDB    rdfs:subClassOf  ja:Model .
    
    
    ## Initialize text query
    
    [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
    
    # A TextDataset is a regular dataset with a text index.
    
    text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
    
    # Lucene index
    
    text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
    
    # Solr index
    
    text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .
    
    
    ## ---------------------------------------------------------------
    
    ## This URI must be fixed - it's used to assemble the text dataset.
    
    
    :text_dataset rdf:type     text:TextDataset ;
    
        text:dataset   <#dataset> ;
    
        text:index     <#indexLucene> ;
    
        .
    
    
    # A TDB datset used for RDF storage
    
    <#dataset> rdf:type      tdb:DatasetTDB ;
    
        tdb:location "DB" ;
    
        tdb:unionDefaultGraph true ; # Optional
    
        .
    
    
    # Text index description
    
    <#indexLucene> a text:TextIndexLucene ;
    
        text:directory <file:Lucene> ;
    
        ##text:directory "mem" ;
    
        text:entityMap <#entMap> ;
    
        .
    
    
    # Mapping in the index
    
    # URI stored in field "uri"
    
    # rdfs:label is mapped to field "text"
    
    <#entMap> a text:EntityMap ;
    
        text:entityField      "uri" ;
    
        text:defaultField     "text" ;
    
        text:map (
    
             [ text:field "text" ; text:predicate rdfs:label ]
    
             ) .
    
    
    [] rdf:type fuseki:Server ;
    
       # Server-wide context parameters can be given here.
    
       # For example, to set query timeouts: on a server-wide basis:
    
       # Format 1: "1000" -- 1 second timeout
    
       # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query.
    
       # See java doc for ARQ.queryTimeout
    
       # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;
    
    
       # Load custom code (rarely needed)
    
       # ja:loadClass "your.code.Class" ;
    
    
       # Services available.  Only explicitly listed services are configured.
    
       #  If there is a service description not linked from this list, it is ignored.
    
       fuseki:services (
    
        <#service_text_tdb>
    
       ) .
    
    
    <#service_text_tdb>  rdf:type fuseki:Service ;
    
        fuseki:name              "Music" ;       # http://host:port/tdb
    
        fuseki:serviceQuery               "query" ;    # SPARQL query service
    
        fuseki:serviceQuery               "sparql" ;   # SPARQL query service
    
        fuseki:serviceUpdate              "update" ;   # SPARQL query service
    
        fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
    
        fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store protocol (read and write)
    
        fuseki:dataset                  :text_dataset;
    
        .
    
    ________________________________
    From: Kumar,Abhishek <ab...@ufl.edu>
    Sent: Monday, November 28, 2016 6:31:03 PM
    To: users@jena.apache.org
    Subject: Text Search Using Lucene
    
    Hi,
    
    
    I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.
    
    
    But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.
    
    
    What I have tried so far
    
    1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file
    
    2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file
    
    3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl
    
    
    I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results
    
    
    but using --desc gives error no service name.
    
    
    Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.
    
    
    Can someone please help here?
    
    
    Thanks & Regards
    
    Abhishek Kumar

Re: Text Search Using Lucene

Posted by "Kumar,Abhishek" <ab...@ufl.edu>.

This is my config file


@prefix :        <http://localhost/jena_example/#> .

@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .

@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .

@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

@prefix text:    <http://jena.apache.org/text#> .

@prefix fuseki:  <http://jena.apache.org/fuseki#> .


## Example of a TDB dataset and text index

## Initialize TDB

[] ja:loadClass "org.apache.jena.tdb.TDB" .

tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .

tdb:GraphTDB    rdfs:subClassOf  ja:Model .


## Initialize text query

[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .

# A TextDataset is a regular dataset with a text index.

text:TextDataset      rdfs:subClassOf   ja:RDFDataset .

# Lucene index

text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

# Solr index

text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .


## ---------------------------------------------------------------

## This URI must be fixed - it's used to assemble the text dataset.


:text_dataset rdf:type     text:TextDataset ;

    text:dataset   <#dataset> ;

    text:index     <#indexLucene> ;

    .


# A TDB datset used for RDF storage

<#dataset> rdf:type      tdb:DatasetTDB ;

    tdb:location "DB" ;

    tdb:unionDefaultGraph true ; # Optional

    .


# Text index description

<#indexLucene> a text:TextIndexLucene ;

    text:directory <file:Lucene> ;

    ##text:directory "mem" ;

    text:entityMap <#entMap> ;

    .


# Mapping in the index

# URI stored in field "uri"

# rdfs:label is mapped to field "text"

<#entMap> a text:EntityMap ;

    text:entityField      "uri" ;

    text:defaultField     "text" ;

    text:map (

         [ text:field "text" ; text:predicate rdfs:label ]

         ) .


[] rdf:type fuseki:Server ;

   # Server-wide context parameters can be given here.

   # For example, to set query timeouts: on a server-wide basis:

   # Format 1: "1000" -- 1 second timeout

   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query.

   # See java doc for ARQ.queryTimeout

   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;


   # Load custom code (rarely needed)

   # ja:loadClass "your.code.Class" ;


   # Services available.  Only explicitly listed services are configured.

   #  If there is a service description not linked from this list, it is ignored.

   fuseki:services (

    <#service_text_tdb>

   ) .


<#service_text_tdb>  rdf:type fuseki:Service ;

    fuseki:name              "Music" ;       # http://host:port/tdb

    fuseki:serviceQuery               "query" ;    # SPARQL query service

    fuseki:serviceQuery               "sparql" ;   # SPARQL query service

    fuseki:serviceUpdate              "update" ;   # SPARQL query service

    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service

    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store protocol (read and write)

    fuseki:dataset                  :text_dataset;

    .

________________________________
From: Kumar,Abhishek <ab...@ufl.edu>
Sent: Monday, November 28, 2016 6:31:03 PM
To: users@jena.apache.org
Subject: Text Search Using Lucene

Hi,


I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.


But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.


What I have tried so far

1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file

2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file

3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl


I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results


but using --desc gives error no service name.


Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.


Can someone please help here?


Thanks & Regards

Abhishek Kumar