You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Tao (陶信东)" <ta...@myhexin.com> on 2012/05/03 04:23:10 UTC

RE: [ANN] Release of Apache Jena LARQ 1.0.0-incubating

Hi Paolo,

Just noticed some change in the LARQ score. Originally the score seemed to
be normalized to range [0, 1]. Now the score can be higher than 1. Is this a
change of Lucene or LARQ?

How can I get the old good [0, 1] LARQ score now?

Thanks
Tao

-----Original Message-----
From: Tao (陶信东) [mailto:taoxindong@myhexin.com] 
Sent: Saturday, April 28, 2012 10:43 AM
To: jena-users@incubator.apache.org
Subject: RE: [ANN] Release of Apache Jena LARQ 1.0.0-incubating

Thanks Paolo! This finally works!!

-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
Sent: Friday, April 27, 2012 11:06 PM
To: jena-users@incubator.apache.org
Subject: Re: [ANN] Release of Apache Jena LARQ 1.0.0-incubating

Hi Tao,
could you try this for me:

svn co http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/
fuseki cd fuseki

edit pom.xml adding LARQ dependency:

     <dependency>
       <groupId>org.apache.jena</groupId>
       <artifactId>jena-larq</artifactId>
       <version>1.0.0-incubating</version>
     </dependency>

mvn clean package
java -jar target/jena-fuseki-0.2.2-incubating-SNAPSHOT-server.jar
--config=config.ttl

---- config.ttl ----
@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .

[] rdf:type fuseki:Server ;
   fuseki:services (
     <#service1>
   ) .

<#service1> rdf:type fuseki:Service ;
    fuseki:name                        "ds" ;
    fuseki:serviceQuery                "sparql" ;
    fuseki:serviceQuery                "query" ;
    fuseki:serviceUpdate               "update" ;
    fuseki:serviceUpload               "upload" ;
    fuseki:serviceReadWriteGraphStore  "data" ;
    fuseki:serviceReadGraphStore       "get" ;
    fuseki:serviceReadGraphStore       "" ;
    fuseki:dataset                     <#dataset1> ;
    .

<#dataset1> rdf:type      tdb:DatasetTDB ;
    tdb:location "/tmp/tdb" ;
    ja:textIndex "/tmp/lucene"
    .
----

I get results when I query with:
s-query --service=http://127.0.0.1:3030/ds/sparql "PREFIX pf:
<http://jena.hpl.hp.com/ARQ/property#> SELECT * { ?s pf:textMatch 'word'}
LIMIT 10"

Now, I have done the exactly the same with:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/tags/jena-fuseki
-0.2.1-incubating-RC-1/
and
http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/tags/jena-fuseki
-0.2.1-incubating/
and I experience your problem. (I checked both since I was not 100% which
tag corresponds to the actual release, it should be
jena-fuseki-0.2.1-incubating,
right?)

The Lucene index is created correctly and it's there, I can query in using
the larq.larq command line:
java -cp target/jena-fuseki-0.2.1-incubating-server.jar larq.larq
--larq=/tmp/lucene "word"

The only hypothesis I have is that somehow something goes wrong with the
assembler which initialize LARQ. I am sorry not being able to be more
helpful at the moment.

The only suggestion I have for you is to try using Fuseki from trunk:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/
and let me know if you experience problems with that. That works for me.

In the meantime, I'll see if I can find what's the cause of this problem.

Paolo

Tao (陶信东) wrote:
> Hi Paolo,
> The index files seem ok (see below). It was obviously created by 
> Fuseki (It didn't exist before I create it). It seems that LARQ is not 
> called when performing the pf:textMatch search.
> 
> -rw-r--r-- 1   17459784 Apr 27 14:32 _2.fdt
> -rw-r--r-- 1    2325668 Apr 27 14:32 _2.fdx
> -rw-r--r-- 1         40 Apr 27 14:32 _2.fnm
> -rw-r--r-- 1    2269211 Apr 27 14:32 _2.frq
> -rw-r--r-- 1     581420 Apr 27 14:32 _2.nrm
> -rw-r--r-- 1    1414550 Apr 27 14:32 _2.prx
> -rw-r--r-- 1      40894 Apr 27 14:32 _2.tii
> -rw-r--r-- 1    4475539 Apr 27 14:32 _2.tis
> -rw-r--r-- 1        253 Apr 27 14:32 segments_1
> -rw-r--r-- 1         20 Apr 27 14:32 segments.gen
> -rw-r--r-- 1          0 Apr 27 14:32 write.lock
> 
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
> Sent: Friday, April 27, 2012 3:26 PM
> To: jena-users@incubator.apache.org
> Subject: Re: [ANN] Release of Apache Jena LARQ 1.0.0-incubating
> 
> Hi Tao,
> can you share the bit of your Fuseki configuration where you point at 
> the path for your Lucene indexes?
> 
> #dataset> rdf:type tdb:DatasetTDB ;
>   ...
>   ja:textIndex "/path/to/lucene/index/" ;
>   .
> 
> Can you do an ls -la of that directory to see the file sizes?
> 
> Can you try to point to a non existing directory and restart Fuseki?
> 
> If the directory exists an empty index will be created (this is a side 
> effect of opening a Lucene Directory when your Lucene index does not
exist).
> If the directory does not exist your dataset will be indexed.
> 
> Let's see if you can sort this out and let me know if you have ideas 
> on how to improve this.
> 
> Paolo
> 
> Tao (陶信东) wrote:
>> Thanks Paolo. Now I can find the lucene files. But pf:textMatch still 
>> doesn't seem to work. See details below.
>>
>> select * where {?s rdfs:label ?o} limit 10
>> ---------------------------------------------------------------------
>> -
>> ------
>> -
>> | s                                                      | o
>> |
>> =====================================================================
>> =
>> ======
>> =
>> | <http://xmlns.com/foaf/0.1/LabelProperty>              | "Label
> Property"
>> |
>> | <http://xmlns.com/foaf/0.1/Person>                     | "Person"
>> |
>> | <http://www.w3.org/2000/10/swap/pim/contact#Person>    | "Person"
>> |
>> | <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> | "Spatial
Thing"
>> |
>> | <http://xmlns.com/foaf/0.1/Document>                   | "Document"
>> |
>> | <http://xmlns.com/foaf/0.1/Organization>               | "Organization"
>> |
>> | <http://xmlns.com/foaf/0.1/Group>                      | "Group"
>> |
>> | <http://xmlns.com/foaf/0.1/Agent>                      | "Agent"
>> |
>> | <http://xmlns.com/foaf/0.1/Project>                    | "Project"
>> |
>> | <http://xmlns.com/foaf/0.1/Image>                      | "Image"
>> |
>> ---------------------------------------------------------------------
>> -
>> ------
>> -
>>
>> prefix pf: <http://jena.hpl.hp.com/ARQ/property#>
>> select * where {?s pf:textMatch "+Label"} limit 10
>> -----
>> | s |
>> =====
>> -----
>>
>>
>> -----Original Message-----
>> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
>> Sent: Thursday, April 26, 2012 10:26 PM
>> To: jena-users@incubator.apache.org
>> Subject: Re: [ANN] Release of Apache Jena LARQ 1.0.0-incubating
>>
>> Tao (陶信东) wrote:
>>> Thanks Paolo, I'll try that. 
>>>
>>> Meanwhile, would you please clarify whether Fuseki will build LARQ 
>>> index automatically, after the patch?
>> Yes, if the directory does not exist LARQ will create it and build 
>> the Lucene index automatically, if a directory exists a Lucene index 
>> will be opened pointing to that directory (however, if the directory 
>> exists and it's empty, you'll have an empty Lucene index as result):
>>
>>         if ( indexPath != null )
>>         {
>>             File path = new File(indexPath) ;
>>             if ( !path.exists() )
>>             {
>>                 log.debug("Directory {} does not exist, building 
>> Lucene
>> index...") ;
>>                 path.mkdirs() ;
>>                 build ( dataset, path ) ;
>>             }
>>             directory = FSDirectory.open(path);
>>         } else {
>>             directory = new RAMDirectory();
>>         }
>>
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/tags/jena-l
>> a
>> rq-1.0
>> .0-incubating/src/main/java/org/apache/jena/larq/assembler/AssemblerL
>> A
>> RQ.jav
>> a
>>
>>> Your reply and Venkat's seemed different regarding this.
>> This is because things have been improved since the latest release.
>>
>> Paolo
>>
>>> Thanks
>>> Tao
>>>
>>> -----Original Message-----
>>> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
>>> Sent: Wednesday, April 25, 2012 6:51 PM
>>> To: jena-users@incubator.apache.org
>>> Subject: Re: [ANN] Release of Apache Jena LARQ 1.0.0-incubating
>>>
>>> Hi Tao
>>>
>>> Tao(陶信东) wrote:
>>>> Hi Paolo,
>>>>
>>>> This is great! I've been waiting for it since several weeks ago.
>>> Good to know.
>>>
>>>> However, after I configured Fuseki for it by adding LARQ to my 
>>>> class path and the following lines to the assembly file, 
>>>> pf:textMatch doesn't work (the usual sparql query works).
>>>>
>>>> <#dataset> rdf:type tdb:DatasetTDB ;
>>>>   tdb:location "/path/to/my/tdb/indexes/" ;
>>>>   ja:textIndex "/path/to/lucene/index/" ;
>>>>   .
>>>>
>>>> Is Fuseki supported by this LARQ release? 
>>> Fuseki does not include LARQ.
>>>
>>> For more details, please, see also:
>>> https://issues.apache.org/jira/browse/JENA-63
>>> https://issues.apache.org/jira/browse/JENA-164
>>>
>>> The good news is that it is extremely easy to check out Fuseki 
>>> sources, patch the pom.xml file and repackage it (i.e. mvn package).
See:
>>> https://issues.apache.org/jira/secure/attachment/12504042/JENA-63_Fu
>>> s
>>> e
>>> ki_r12
>>> 03107.patch
>>> (be warned, that patch might not be up-to-date, but you see it's 
>>> just a new dependency on LARQ).
>>>
>>>> Or should I build the lucene index
>>>> by myself when Fuseki started?
>>> No, you do not need that.
>>>
>>>  1. Checkout Fuseki source code.
>>>  2. Add LARQ dependency to the Fuseki pom.xml file  3. mvn package
>>>
>>> Let me know how it goes and if you have problems.
>>>
>>> Cheers,
>>> Paolo
>>>
>>>> Thanks
>>>> Tao
>>>>
>>>> -----Original Message-----
>>>> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
>>>> Sent: Tuesday, April 24, 2012 5:43 AM
>>>> To: jena-users@incubator.apache.org
>>>> Subject: [ANN] Release of Apache Jena LARQ 1.0.0-incubating
>>>>
>>>> LARQ 1.0.0-incubating has been released, this is the first release 
>>>> of LARQ as separate module and under the Apache License.
>>>>
>>>> LARQ is a combination of ARQ and Lucene aimed at providing 
>>>> developers with the ability to perform free text searches within 
>>>> their SPARQL
>>> queries.
>>>> Documentation for LARQ is available here:
>>>>
>>>>  - http://incubator.apache.org/jena/documentation/larq/
>>>>  - http://incubator.apache.org/jena/documentation/javadoc/larq/
>>>>
>>>>
>>>> == Mailing lists
>>>>
>>>> The user mailing list for Jena is  jena-users@incubator.apache.org 
>>>> Send email to  jena-users-subscribe@incubator.apache.org  to subscribe.
>>>>
>>>> See also:
>>>> http://incubator.apache.org/jena/help_and_support/index.html
>>>>
>>>>
>>>> == About This Release
>>>>
>>>> The main new feature in this release is support for updating the 
>>>> Lucene index as RDF statements are added/removed to the Jena Model 
>>>> via the Jena APIs.
>>>> Moreover, LARQ now depends on Apache Lucene 3.5.0.
>>>>
>>>>
>>>> == Download
>>>>
>>>> Maven artifacts are here:
>>>> http://repo1.maven.org/maven2/org/apache/jena/jena-larq/1.0.0-incub
>>>> a
>>>> t
>>>> i
>>>> ng/
>>>>
>>>> Source release is here:
>>>> http://www.apache.org/dyn/closer.cgi/incubator/jena/jena-larq-1.0.0
>>>> -
>>>> i
>>>> n
>>>> cubati
>>>> ng
>>>>
>>>>
>>>> == Status
>>>>
>>>> Apache Jena is an effort undergoing incubation at the Apache 
>>>> Software Foundation (ASF), sponsored by the Apache Incubator PMC.
>>>>
>>>> Incubation is required of all newly accepted projects until a 
>>>> further review indicates that the infrastructure, communications, 
>>>> and decision making process have stabilized in a manner consistent 
>>>> with other successful ASF projects.
>>>>
>>>> While incubation status is not necessarily a reflection of the 
>>>> completeness or stability of the code, it does indicate that the 
>>>> project has yet to be fully endorsed by the ASF.
>>>>
>>>> For more information about the incubation status of the Jena 
>>>> project you can go to the following page:
>>>> http://incubator.apache.org/projects/jena.html
>>>>
>