You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Jérôme <je...@unicaen.fr> on 2011/09/12 11:29:59 UTC

Fuseki + Larq : Lucene indexing

Hi,

i'm trying to use LARQ with my Fuseki server.

I would like to programmaticaly indexing(with lucene) documents when the 
server starts.

Something like that:

Model model = ModelFactory.createDefaultModel();
IndexBuilderString larqBuilder = new IndexBuilderString();
model.register(larqBuilder);
FileManager.get().readModel(model, "Data/books.ttl");
larqBuilder.closeWriter();
model.unregister(larqBuilder);
index = larqBuilder.getIndex();
LARQ.setDefaultIndex(index);

Is it possible? In which class it would be the best?

Thanks

Jerome

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Andy Seaborne wrote:
> 
> 
> On 12/09/11 11:24, Paolo Castagna wrote:
>> Hi Jérôme,
>> you are lucky, I've just exactly the same need as you and I've 
>> something about it recently.
>> Unfortunately, the new LARQ (as a separate module) still did not make 
>> it into Fuseki on trunk.
>>
>> We have an open JIRA for it which you can watch|vote|contribute to:
>> https://issues.apache.org/jira/browse/JENA-63
> 
> Should we chnage the title of JENA-63?  It's not about Fuseki, which 
> just supplies the SPARQL protocol and routes requests to the right 
> dataset.  It's the dataset that must do the LARQ coordination - initial 
> indexing and incrementally later, across restarts.

Hi Andy,
I am not sure the title of the JENA-63 is going to make much difference.

Users (@ Talis as well) want to easily have SPARQL endpoints and they also
want to easily run free-text searches on those SPARQL endpoints.
Fuseki, currently, provide a very good user experience in terms of quickly
have a SPARQL endpoint, however it does not include free-text search
capabilities.

The patch in JENA-63 does not contain any code change to Fuseki source code,
it only adds LARQ jar (and transitively Lucene v3.1.0) to its dependencies.
All the other necessary code changes have been done already elsewhere (i.e.
ARQ and LARQ).

What would be a more appropriate title?

The overall goal is to make as easy as possible for users to perform free-text
searches over their RDF data if they want to. Notice: this feature is not
"standard" and it is not enabled by default.

Once LARQ is properly released, do you see problems in adding it (and Lucene
v3.1.0) to the Fuseki dependencies?

LARQ is ~46KB.
Lucene v3.1.0 is (unfortunately) much bigger: ~1.2MB.

Is the size of Lucene's jar a concern?

Paolo

> 
> It is possible to get Fuseki to automatically run initialization code - 
> the configuration file support ja:loadClass (a bit misnamed - it loads 
> and runs a static) but I don't think that is anything other than a 
> stop-gap.
> 
>     Andy
> 
>>
>> In the meantime, if you want to use LARQ with Fuseki this is what you 
>> need to do:
>>
>> cd /tmp
>> svn co 
>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>> fuseki
>> cd /tmp/fuseki
>> wget 
>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>
>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>> mvn package
>>
>> Now, you can simply use the Fuseki config.ttl file as explained here:
>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>> and use the ja:textIndex property on a dataset to specify an non 
>> existing directory.
>>
>> LARQ when you point it at a non existing directory will perform the 
>> indexing for you.
>> This is particularly useful when you have multiple datasets configured 
>> in Fuseki.
>> WARNING: it might take a while to index large datasets, so be patient.
>>
>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>
>> LARQ snapshots are here:
>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>
>> and I can quickly fix/improve things if you have problems or good 
>> suggestions.
>>
>> I hope this helps, let me know how it goes.
>>
>> Paolo
>>
>> Jérôme wrote:
>>> Hi,
>>>
>>> i'm trying to use LARQ with my Fuseki server.
>>>
>>> I would like to programmaticaly indexing(with lucene) documents when the
>>> server starts.
>>>
>>> Something like that:
>>>
>>> Model model = ModelFactory.createDefaultModel();
>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>> model.register(larqBuilder);
>>> FileManager.get().readModel(model, "Data/books.ttl");
>>> larqBuilder.closeWriter();
>>> model.unregister(larqBuilder);
>>> index = larqBuilder.getIndex();
>>> LARQ.setDefaultIndex(index);
>>>
>>> Is it possible? In which class it would be the best?
>>>
>>> Thanks
>>>
>>> Jerome
>>>
>>>
>>>

Re: Fuseki + Larq : Lucene indexing

Posted by Andy Seaborne <an...@epimorphics.com>.


On 12/09/11 11:24, Paolo Castagna wrote:
> Hi Jérôme,
> you are lucky, I've just exactly the same need as you and I've something about it recently.
> Unfortunately, the new LARQ (as a separate module) still did not make it into Fuseki on trunk.
>
> We have an open JIRA for it which you can watch|vote|contribute to:
> https://issues.apache.org/jira/browse/JENA-63

Should we chnage the title of JENA-63?  It's not about Fuseki, which 
just supplies the SPARQL protocol and routes requests to the right 
dataset.  It's the dataset that must do the LARQ coordination - initial 
indexing and incrementally later, across restarts.

It is possible to get Fuseki to automatically run initialization code - 
the configuration file support ja:loadClass (a bit misnamed - it loads 
and runs a static) but I don't think that is anything other than a stop-gap.

	Andy

>
> In the meantime, if you want to use LARQ with Fuseki this is what you need to do:
>
> cd /tmp
> svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
> cd /tmp/fuseki
> wget https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
> patch -p0<  JENA-63_Fuseki_r1136050.patch
> mvn package
>
> Now, you can simply use the Fuseki config.ttl file as explained here:
> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
> and use the ja:textIndex property on a dataset to specify an non existing directory.
>
> LARQ when you point it at a non existing directory will perform the indexing for you.
> This is particularly useful when you have multiple datasets configured in Fuseki.
> WARNING: it might take a while to index large datasets, so be patient.
>
> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>
> LARQ snapshots are here:
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/
> and I can quickly fix/improve things if you have problems or good suggestions.
>
> I hope this helps, let me know how it goes.
>
> Paolo
>
> Jérôme wrote:
>> Hi,
>>
>> i'm trying to use LARQ with my Fuseki server.
>>
>> I would like to programmaticaly indexing(with lucene) documents when the
>> server starts.
>>
>> Something like that:
>>
>> Model model = ModelFactory.createDefaultModel();
>> IndexBuilderString larqBuilder = new IndexBuilderString();
>> model.register(larqBuilder);
>> FileManager.get().readModel(model, "Data/books.ttl");
>> larqBuilder.closeWriter();
>> model.unregister(larqBuilder);
>> index = larqBuilder.getIndex();
>> LARQ.setDefaultIndex(index);
>>
>> Is it possible? In which class it would be the best?
>>
>> Thanks
>>
>> Jerome
>>
>>
>>

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Jérôme wrote:
> Le 13/09/11 16:59, Paolo Castagna a écrit :
>> Jérôme wrote:
>>> Le 12/09/11 17:52, Paolo Castagna a écrit :
>>>> Jérôme wrote:
>>>>> Le 12/09/11 16:13, Paolo Castagna a écrit :
>>>>>> Jérôme wrote:
>>>>>>> Le 12/09/11 15:18, Paolo Castagna a écrit :
>>>>>>>> Jérôme wrote:
>>>>>>>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>>>>>>>> Hi Jérôme,
>>>>>>>>>> you are lucky, I've just exactly the same need as you and I've 
>>>>>>>>>> something about it recently.
>>>>>>>>>> Unfortunately, the new LARQ (as a separate module) still did 
>>>>>>>>>> not make it into Fuseki on trunk.
>>>>>>>>>>
>>>>>>>>>> We have an open JIRA for it which you can 
>>>>>>>>>> watch|vote|contribute to:
>>>>>>>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>>>>>>>
>>>>>>>>>> In the meantime, if you want to use LARQ with Fuseki this is 
>>>>>>>>>> what you need to do:
>>>>>>>>>>
>>>>>>>>>> cd /tmp
>>>>>>>>>> svn co 
>>>>>>>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>>>>>>>> fuseki
>>>>>>>>>> cd /tmp/fuseki
>>>>>>>>>> wget 
>>>>>>>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>>>>>>>
>>>>>>>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>>>>>>>> mvn package
>>>>>>>>>>
>>>>>>>>>> Now, you can simply use the Fuseki config.ttl file as 
>>>>>>>>>> explained here:
>>>>>>>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>>>>>>>> and use the ja:textIndex property on a dataset to specify an 
>>>>>>>>>> non existing directory.
>>>>>>>>> Is it possible to have a fuseki configuration example with a 
>>>>>>>>> ja:textIndex property? I am trying to
>>>>>>>>> add it on the book service (books.ttl) with no results...
>>>>>>>>
>>>>>>>>
>>>>>>>> Use tdbloader to load some RDF data into /tmp/tdb, then change 
>>>>>>>> <#dataset>
>>>>>>>> on the example config.ttl file you have in Fuseki:
>>>>>>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>>>>>>>
>>>>>>>
>>>>>>> I've never used the TDB loader - How does it work? Is there an 
>>>>>>> on-line documentation?
>>>>>>
>>>>>> Fortunately, TDB is included in Fuseki uber jar (since it includes 
>>>>>> Fuseki
>>>>>> binaries as well as all the jar dependencies, including TDB). So, 
>>>>>> in this
>>>>>> case, for an end-users it's quite useful.
>>>>>>
>>>>>> Here is what I do:
>>>>>>
>>>>>> cd /tmp/fuseki
>>>>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>>>>>> --loc=/tmp/tdb books.ttl
>>>>>
>>>>> Thank you! It's ok for that!
>>>>
>>>> Good.
>>>>
>>>> So, are you able to query your RDF data using the pf:textMatch property
>>>> function? For example:
>>>>
>>>> PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
>>>> PREFIX dc: <http://purl.org/dc/elements/1.1/>
>>>> DESCRIBE ?doc {
>>>>  ?title pf:textMatch 'potter' .
>>>>  ?doc dc:title ?title .
>>>> } LIMIT 10
>>>
>>> I thought, but it doesn't work...
>>> There is no error, but my resultSet is empty.
>>> It's ok for Sparql queries, not for LARQ ones.
>>>
>>> How to be sure that my document is well indexed?
>>>
>>> I've ran:
>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>>> --loc=/tmp/tdb books.ttl
>>>
>>> My end config.ttl file contains:
>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>     tdb:location "/tmp/tdb" ;
>>>     ja:textIndex "/tmp/lucene" ;
>>>     # Query timeout on this dataset (milliseconds)
>>>     ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
>>> ##      tdb:unionDefaultGraph true ;
>>>
>>>
>>
>> Can you try to delete the /tmp/lucene directory and restart Fuseki?
>>
> Ok...fine play, you're right! The Larq query works now!

This is because LARQ will create a Lucene index, indexing content in your
(TDB) dataset if and only if the directory you point it at does not exist.
In an empty index is there already, it will not overwrite it.

> Thank you very much.

I am glad it's working.

Paolo

> 
> Jérôme
> 
>> Let me know,
>> Paolo
>>
>>> I run Fuseki with this command line:
>>> ./fuseki-server --config=config.ttl
>>>>
>>>>> Now i would like to add modifications in the larq module.
>>>>
>>>> LARQ is open source and you are free and welcome to do so if you 
>>>> want/need:
>>>>
>>>> cd /tmp
>>>> svn co 
>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/ larq
>>>> cd /tmp/larq
>>>> ... make your changes ...
>>>> mvn install
>>>>
>>>> Using mvn install Maven will install LARQ artifacts in your local 
>>>> Maven repository in your home directory.
>>>>
>>>> However, it would be good if you could share what are your 
>>>> modifications,
>>>> why you need them and your use case. Your changes might be useful to 
>>>> others.
>>>>
>>>> If your changes do not get contributed back, you will need to 
>>>> maintain them
>>>> and they will represent a cost for you. Every time we release a new 
>>>> version
>>>> of LARQ with features you might want, you will need to re-apply your 
>>>> changes.
>>> Yes i know.
>>>>
>>>> So, I encourage you to share them and maybe open a new JIRA issue 
>>>> (with a
>>>> patch attached to it). Not all changes are general and useful enough 
>>>> to get
>>>> committed, but let's see.
>>>
>>> Ok - it could be a good solution for everybody.
>>>>
>>>>> I've downloaded and built it. How can i re compile my Fuseki maven 
>>>>> project using my own larq jar?
>>>>
>>>> Once you have published your modified version of LARQ in your local 
>>>> Maven
>>>> repository, it is available to other projects on your machine.
>>>>
>>>> This is how you recompile Fuseki using your modified LARQ jar:
>>>>
>>>> cd /tmp
>>>> svn co 
>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>> fuseki
>>>> cd /tmp/fuseki
>>>> wget 
>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>
>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>> mvn package
>>>>
>>>> Make sure LARQ versions in your LARQ pom.xml and Fuseki pom.xml 
>>>> correspond.
>>>>
>>>> Once again, if you get your changes adopted and committed to trunk 
>>>> you would
>>>> not need to do all this.
>>>>
>>>> Is it explain() the feature you desperately need?
>>> Not really - finally, explain is not what i'm expecting.
>>>>
>>>> Could you share more on why you need it and what is your use case?
>>> The aim is to querying (from an html form) an RDF graph where not are 
>>> documents (text documents).
>>> We would like to obtain matching documents (availables from lucene 
>>> Hits) + a list of matching String + a list of offset.
>>> Offsets/Positions will be sent to another module that needs the 
>>> positions to perform its tasks.
>>>
>>> Example of query (case of a LARQ query, queries could be SPARQL only):
>>> Query: w*
>>>
>>> doc1: when I was a child I was a Jedi
>>>
>>> Expected result:
>>> Doc id: 1
>>> Matching strings: when, was, was
>>> Offsets: 0-3; 7-9;21-23
>>>
>>>
>>> Do you think it would be interesting to share it?
>>>
>>> Jérôme
>>>>
>>>> explain() can be expensive and it is Lucene specific, if could cause
>>>> problems if in the future we want to support/move/change and use Solr
>>>> and/or ElasticSearch: https://issues.apache.org/jira/browse/JENA-17.
>>>>
>>>> Paolo
>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>>
>>>>>> This will load the data in books.ttl and build the TDB indexes in 
>>>>>> /tmp/tdb
>>>>>>
>>>>>> You can also use the -h option for help:
>>>>>>
>>>>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
>>>>>> tdbloader [--desc DATASET | -loc DIR] FILE ...
>>>>>>   Location
>>>>>>       --loc=DIR              Location (a directory)
>>>>>>       --tdb=                 Assembler description file
>>>>>>   Symbol definition
>>>>>>       --set                  Set a configuration symbol to a value
>>>>>>       --strict               Operate in strict SPARQL mode (no 
>>>>>> extensions of any kind)
>>>>>>       --graph=IRI            Act on a named graph
>>>>>>       --desc=                Assembler description file
>>>>>>   General
>>>>>>       -v   --verbose         Verbose
>>>>>>       -q   --quiet           Run with minimal output
>>>>>>       --debug                Output information for debugging
>>>>>>       --help
>>>>>>       --version              Version information
>>>>>>
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>>>>>     tdb:location "/tmp/tdb" ;
>>>>>>>>     ja:textIndex "/tmp/lucene" ;
>>>>>>>>     .
>>>>>>>>
>>>>>>>> If the /tmp/lucene directory does not exist, LARQ will index 
>>>>>>>> what you have in
>>>>>>>> /tmp/tdb creating the appropriate Lucene indexes.
>>>>>>>>
>>>>>>>>
>>>>>>>> Paolo
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>> LARQ when you point it at a non existing directory will 
>>>>>>>>>> perform the indexing for you.
>>>>>>>>>> This is particularly useful when you have multiple datasets 
>>>>>>>>>> configured in Fuseki.
>>>>>>>>>> WARNING: it might take a while to index large datasets, so be 
>>>>>>>>>> patient.
>>>>>>>>>>
>>>>>>>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>>>>>>>
>>>>>>>>>> LARQ snapshots are here:
>>>>>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>>>>>>>
>>>>>>>>>> and I can quickly fix/improve things if you have problems or 
>>>>>>>>>> good suggestions.
>>>>>>>>>>
>>>>>>>>>> I hope this helps, let me know how it goes.
>>>>>>>>>>
>>>>>>>>>> Paolo
>>>>>>>>>>
>>>>>>>>>> Jérôme wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>>>>>>>
>>>>>>>>>>> I would like to programmaticaly indexing(with lucene) 
>>>>>>>>>>> documents when the
>>>>>>>>>>> server starts.
>>>>>>>>>>>
>>>>>>>>>>> Something like that:
>>>>>>>>>>>
>>>>>>>>>>> Model model = ModelFactory.createDefaultModel();
>>>>>>>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>>>>>>>> model.register(larqBuilder);
>>>>>>>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>>>>>>>> larqBuilder.closeWriter();
>>>>>>>>>>> model.unregister(larqBuilder);
>>>>>>>>>>> index = larqBuilder.getIndex();
>>>>>>>>>>> LARQ.setDefaultIndex(index);
>>>>>>>>>>>
>>>>>>>>>>> Is it possible? In which class it would be the best?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Jerome
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Jérôme <je...@unicaen.fr>.

Le 13/09/11 16:59, Paolo Castagna a écrit :
> Jérôme wrote:
>> Le 12/09/11 17:52, Paolo Castagna a écrit :
>>> Jérôme wrote:
>>>> Le 12/09/11 16:13, Paolo Castagna a écrit :
>>>>> Jérôme wrote:
>>>>>> Le 12/09/11 15:18, Paolo Castagna a écrit :
>>>>>>> Jérôme wrote:
>>>>>>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>>>>>>> Hi Jérôme,
>>>>>>>>> you are lucky, I've just exactly the same need as you and I've 
>>>>>>>>> something about it recently.
>>>>>>>>> Unfortunately, the new LARQ (as a separate module) still did 
>>>>>>>>> not make it into Fuseki on trunk.
>>>>>>>>>
>>>>>>>>> We have an open JIRA for it which you can 
>>>>>>>>> watch|vote|contribute to:
>>>>>>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>>>>>>
>>>>>>>>> In the meantime, if you want to use LARQ with Fuseki this is 
>>>>>>>>> what you need to do:
>>>>>>>>>
>>>>>>>>> cd /tmp
>>>>>>>>> svn co 
>>>>>>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>>>>>>> fuseki
>>>>>>>>> cd /tmp/fuseki
>>>>>>>>> wget 
>>>>>>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>>>>>>
>>>>>>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>>>>>>> mvn package
>>>>>>>>>
>>>>>>>>> Now, you can simply use the Fuseki config.ttl file as 
>>>>>>>>> explained here:
>>>>>>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>>>>>>> and use the ja:textIndex property on a dataset to specify an 
>>>>>>>>> non existing directory.
>>>>>>>> Is it possible to have a fuseki configuration example with a 
>>>>>>>> ja:textIndex property? I am trying to
>>>>>>>> add it on the book service (books.ttl) with no results...
>>>>>>>
>>>>>>>
>>>>>>> Use tdbloader to load some RDF data into /tmp/tdb, then change 
>>>>>>> <#dataset>
>>>>>>> on the example config.ttl file you have in Fuseki:
>>>>>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>>>>>>
>>>>>>
>>>>>> I've never used the TDB loader - How does it work? Is there an 
>>>>>> on-line documentation?
>>>>>
>>>>> Fortunately, TDB is included in Fuseki uber jar (since it includes 
>>>>> Fuseki
>>>>> binaries as well as all the jar dependencies, including TDB). So, 
>>>>> in this
>>>>> case, for an end-users it's quite useful.
>>>>>
>>>>> Here is what I do:
>>>>>
>>>>> cd /tmp/fuseki
>>>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>>>>> --loc=/tmp/tdb books.ttl
>>>>
>>>> Thank you! It's ok for that!
>>>
>>> Good.
>>>
>>> So, are you able to query your RDF data using the pf:textMatch property
>>> function? For example:
>>>
>>> PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
>>> PREFIX dc: <http://purl.org/dc/elements/1.1/>
>>> DESCRIBE ?doc {
>>>  ?title pf:textMatch 'potter' .
>>>  ?doc dc:title ?title .
>>> } LIMIT 10
>>
>> I thought, but it doesn't work...
>> There is no error, but my resultSet is empty.
>> It's ok for Sparql queries, not for LARQ ones.
>>
>> How to be sure that my document is well indexed?
>>
>> I've ran:
>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>> --loc=/tmp/tdb books.ttl
>>
>> My end config.ttl file contains:
>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>     tdb:location "/tmp/tdb" ;
>>     ja:textIndex "/tmp/lucene" ;
>>     # Query timeout on this dataset (milliseconds)
>>     ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
>> ##      tdb:unionDefaultGraph true ;
>>
>>
>
> Can you try to delete the /tmp/lucene directory and restart Fuseki?
>
Ok...fine play, you're right! The Larq query works now!
Thank you very much.

Jérôme

> Let me know,
> Paolo
>
>> I run Fuseki with this command line:
>> ./fuseki-server --config=config.ttl
>>>
>>>> Now i would like to add modifications in the larq module.
>>>
>>> LARQ is open source and you are free and welcome to do so if you 
>>> want/need:
>>>
>>> cd /tmp
>>> svn co 
>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/ larq
>>> cd /tmp/larq
>>> ... make your changes ...
>>> mvn install
>>>
>>> Using mvn install Maven will install LARQ artifacts in your local 
>>> Maven repository in your home directory.
>>>
>>> However, it would be good if you could share what are your 
>>> modifications,
>>> why you need them and your use case. Your changes might be useful to 
>>> others.
>>>
>>> If your changes do not get contributed back, you will need to 
>>> maintain them
>>> and they will represent a cost for you. Every time we release a new 
>>> version
>>> of LARQ with features you might want, you will need to re-apply your 
>>> changes.
>> Yes i know.
>>>
>>> So, I encourage you to share them and maybe open a new JIRA issue 
>>> (with a
>>> patch attached to it). Not all changes are general and useful enough 
>>> to get
>>> committed, but let's see.
>>
>> Ok - it could be a good solution for everybody.
>>>
>>>> I've downloaded and built it. How can i re compile my Fuseki maven 
>>>> project using my own larq jar?
>>>
>>> Once you have published your modified version of LARQ in your local 
>>> Maven
>>> repository, it is available to other projects on your machine.
>>>
>>> This is how you recompile Fuseki using your modified LARQ jar:
>>>
>>> cd /tmp
>>> svn co 
>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>> fuseki
>>> cd /tmp/fuseki
>>> wget 
>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>
>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>> mvn package
>>>
>>> Make sure LARQ versions in your LARQ pom.xml and Fuseki pom.xml 
>>> correspond.
>>>
>>> Once again, if you get your changes adopted and committed to trunk 
>>> you would
>>> not need to do all this.
>>>
>>> Is it explain() the feature you desperately need?
>> Not really - finally, explain is not what i'm expecting.
>>>
>>> Could you share more on why you need it and what is your use case?
>> The aim is to querying (from an html form) an RDF graph where not are 
>> documents (text documents).
>> We would like to obtain matching documents (availables from lucene 
>> Hits) + a list of matching String + a list of offset.
>> Offsets/Positions will be sent to another module that needs the 
>> positions to perform its tasks.
>>
>> Example of query (case of a LARQ query, queries could be SPARQL only):
>> Query: w*
>>
>> doc1: when I was a child I was a Jedi
>>
>> Expected result:
>> Doc id: 1
>> Matching strings: when, was, was
>> Offsets: 0-3; 7-9;21-23
>>
>>
>> Do you think it would be interesting to share it?
>>
>> Jérôme
>>>
>>> explain() can be expensive and it is Lucene specific, if could cause
>>> problems if in the future we want to support/move/change and use Solr
>>> and/or ElasticSearch: https://issues.apache.org/jira/browse/JENA-17.
>>>
>>> Paolo
>>>
>>>>
>>>> Thanks.
>>>>
>>>>>
>>>>> This will load the data in books.ttl and build the TDB indexes in 
>>>>> /tmp/tdb
>>>>>
>>>>> You can also use the -h option for help:
>>>>>
>>>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
>>>>> tdbloader [--desc DATASET | -loc DIR] FILE ...
>>>>>   Location
>>>>>       --loc=DIR              Location (a directory)
>>>>>       --tdb=                 Assembler description file
>>>>>   Symbol definition
>>>>>       --set                  Set a configuration symbol to a value
>>>>>       --strict               Operate in strict SPARQL mode (no 
>>>>> extensions of any kind)
>>>>>       --graph=IRI            Act on a named graph
>>>>>       --desc=                Assembler description file
>>>>>   General
>>>>>       -v   --verbose         Verbose
>>>>>       -q   --quiet           Run with minimal output
>>>>>       --debug                Output information for debugging
>>>>>       --help
>>>>>       --version              Version information
>>>>>
>>>>>
>>>>> Paolo
>>>>>
>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>>>>     tdb:location "/tmp/tdb" ;
>>>>>>>     ja:textIndex "/tmp/lucene" ;
>>>>>>>     .
>>>>>>>
>>>>>>> If the /tmp/lucene directory does not exist, LARQ will index 
>>>>>>> what you have in
>>>>>>> /tmp/tdb creating the appropriate Lucene indexes.
>>>>>>>
>>>>>>>
>>>>>>> Paolo
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> LARQ when you point it at a non existing directory will 
>>>>>>>>> perform the indexing for you.
>>>>>>>>> This is particularly useful when you have multiple datasets 
>>>>>>>>> configured in Fuseki.
>>>>>>>>> WARNING: it might take a while to index large datasets, so be 
>>>>>>>>> patient.
>>>>>>>>>
>>>>>>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>>>>>>
>>>>>>>>> LARQ snapshots are here:
>>>>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>>>>>>
>>>>>>>>> and I can quickly fix/improve things if you have problems or 
>>>>>>>>> good suggestions.
>>>>>>>>>
>>>>>>>>> I hope this helps, let me know how it goes.
>>>>>>>>>
>>>>>>>>> Paolo
>>>>>>>>>
>>>>>>>>> Jérôme wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>>>>>>
>>>>>>>>>> I would like to programmaticaly indexing(with lucene) 
>>>>>>>>>> documents when the
>>>>>>>>>> server starts.
>>>>>>>>>>
>>>>>>>>>> Something like that:
>>>>>>>>>>
>>>>>>>>>> Model model = ModelFactory.createDefaultModel();
>>>>>>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>>>>>>> model.register(larqBuilder);
>>>>>>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>>>>>>> larqBuilder.closeWriter();
>>>>>>>>>> model.unregister(larqBuilder);
>>>>>>>>>> index = larqBuilder.getIndex();
>>>>>>>>>> LARQ.setDefaultIndex(index);
>>>>>>>>>>
>>>>>>>>>> Is it possible? In which class it would be the best?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Jerome
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Jérôme wrote:
> Le 12/09/11 17:52, Paolo Castagna a écrit :
>> Jérôme wrote:
>>> Le 12/09/11 16:13, Paolo Castagna a écrit :
>>>> Jérôme wrote:
>>>>> Le 12/09/11 15:18, Paolo Castagna a écrit :
>>>>>> Jérôme wrote:
>>>>>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>>>>>> Hi Jérôme,
>>>>>>>> you are lucky, I've just exactly the same need as you and I've 
>>>>>>>> something about it recently.
>>>>>>>> Unfortunately, the new LARQ (as a separate module) still did not 
>>>>>>>> make it into Fuseki on trunk.
>>>>>>>>
>>>>>>>> We have an open JIRA for it which you can watch|vote|contribute to:
>>>>>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>>>>>
>>>>>>>> In the meantime, if you want to use LARQ with Fuseki this is 
>>>>>>>> what you need to do:
>>>>>>>>
>>>>>>>> cd /tmp
>>>>>>>> svn co 
>>>>>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>>>>>> fuseki
>>>>>>>> cd /tmp/fuseki
>>>>>>>> wget 
>>>>>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>>>>>
>>>>>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>>>>>> mvn package
>>>>>>>>
>>>>>>>> Now, you can simply use the Fuseki config.ttl file as explained 
>>>>>>>> here:
>>>>>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>>>>>> and use the ja:textIndex property on a dataset to specify an non 
>>>>>>>> existing directory.
>>>>>>> Is it possible to have a fuseki configuration example with a 
>>>>>>> ja:textIndex property? I am trying to
>>>>>>> add it on the book service (books.ttl) with no results...
>>>>>>
>>>>>>
>>>>>> Use tdbloader to load some RDF data into /tmp/tdb, then change 
>>>>>> <#dataset>
>>>>>> on the example config.ttl file you have in Fuseki:
>>>>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>>>>>
>>>>>
>>>>> I've never used the TDB loader - How does it work? Is there an 
>>>>> on-line documentation?
>>>>
>>>> Fortunately, TDB is included in Fuseki uber jar (since it includes 
>>>> Fuseki
>>>> binaries as well as all the jar dependencies, including TDB). So, in 
>>>> this
>>>> case, for an end-users it's quite useful.
>>>>
>>>> Here is what I do:
>>>>
>>>> cd /tmp/fuseki
>>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>>>> --loc=/tmp/tdb books.ttl
>>>
>>> Thank you! It's ok for that!
>>
>> Good.
>>
>> So, are you able to query your RDF data using the pf:textMatch property
>> function? For example:
>>
>> PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
>> PREFIX dc: <http://purl.org/dc/elements/1.1/>
>> DESCRIBE ?doc {
>>  ?title pf:textMatch 'potter' .
>>  ?doc dc:title ?title .
>> } LIMIT 10
> 
> I thought, but it doesn't work...
> There is no error, but my resultSet is empty.
> It's ok for Sparql queries, not for LARQ ones.
> 
> How to be sure that my document is well indexed?
> 
> I've ran:
> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
> --loc=/tmp/tdb books.ttl
> 
> My end config.ttl file contains:
> <#dataset> rdf:type      tdb:DatasetTDB ;
>     tdb:location "/tmp/tdb" ;
>     ja:textIndex "/tmp/lucene" ;
>     # Query timeout on this dataset (milliseconds)
>     ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
> ##      tdb:unionDefaultGraph true ;
> 
> 

Can you try to delete the /tmp/lucene directory and restart Fuseki?

Let me know,
Paolo

> I run Fuseki with this command line:
> ./fuseki-server --config=config.ttl
>>
>>> Now i would like to add modifications in the larq module.
>>
>> LARQ is open source and you are free and welcome to do so if you 
>> want/need:
>>
>> cd /tmp
>> svn co 
>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/ larq
>> cd /tmp/larq
>> ... make your changes ...
>> mvn install
>>
>> Using mvn install Maven will install LARQ artifacts in your local 
>> Maven repository in your home directory.
>>
>> However, it would be good if you could share what are your modifications,
>> why you need them and your use case. Your changes might be useful to 
>> others.
>>
>> If your changes do not get contributed back, you will need to maintain 
>> them
>> and they will represent a cost for you. Every time we release a new 
>> version
>> of LARQ with features you might want, you will need to re-apply your 
>> changes.
> Yes i know.
>>
>> So, I encourage you to share them and maybe open a new JIRA issue (with a
>> patch attached to it). Not all changes are general and useful enough 
>> to get
>> committed, but let's see.
> 
> Ok - it could be a good solution for everybody.
>>
>>> I've downloaded and built it. How can i re compile my Fuseki maven 
>>> project using my own larq jar?
>>
>> Once you have published your modified version of LARQ in your local Maven
>> repository, it is available to other projects on your machine.
>>
>> This is how you recompile Fuseki using your modified LARQ jar:
>>
>> cd /tmp
>> svn co 
>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>> fuseki
>> cd /tmp/fuseki
>> wget 
>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>
>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>> mvn package
>>
>> Make sure LARQ versions in your LARQ pom.xml and Fuseki pom.xml 
>> correspond.
>>
>> Once again, if you get your changes adopted and committed to trunk you 
>> would
>> not need to do all this.
>>
>> Is it explain() the feature you desperately need?
> Not really - finally, explain is not what i'm expecting.
>>
>> Could you share more on why you need it and what is your use case?
> The aim is to querying (from an html form) an RDF graph where not are 
> documents (text documents).
> We would like to obtain matching documents (availables from lucene Hits) 
> + a list of matching String + a list of offset.
> Offsets/Positions will be sent to another module that needs the 
> positions to perform its tasks.
> 
> Example of query (case of a LARQ query, queries could be SPARQL only):
> Query: w*
> 
> doc1: when I was a child I was a Jedi
> 
> Expected result:
> Doc id: 1
> Matching strings: when, was, was
> Offsets: 0-3; 7-9;21-23
> 
> 
> Do you think it would be interesting to share it?
> 
> Jérôme
>>
>> explain() can be expensive and it is Lucene specific, if could cause
>> problems if in the future we want to support/move/change and use Solr
>> and/or ElasticSearch: https://issues.apache.org/jira/browse/JENA-17.
>>
>> Paolo
>>
>>>
>>> Thanks.
>>>
>>>>
>>>> This will load the data in books.ttl and build the TDB indexes in 
>>>> /tmp/tdb
>>>>
>>>> You can also use the -h option for help:
>>>>
>>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
>>>> tdbloader [--desc DATASET | -loc DIR] FILE ...
>>>>   Location
>>>>       --loc=DIR              Location (a directory)
>>>>       --tdb=                 Assembler description file
>>>>   Symbol definition
>>>>       --set                  Set a configuration symbol to a value
>>>>       --strict               Operate in strict SPARQL mode (no 
>>>> extensions of any kind)
>>>>       --graph=IRI            Act on a named graph
>>>>       --desc=                Assembler description file
>>>>   General
>>>>       -v   --verbose         Verbose
>>>>       -q   --quiet           Run with minimal output
>>>>       --debug                Output information for debugging
>>>>       --help
>>>>       --version              Version information
>>>>
>>>>
>>>> Paolo
>>>>
>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>>>     tdb:location "/tmp/tdb" ;
>>>>>>     ja:textIndex "/tmp/lucene" ;
>>>>>>     .
>>>>>>
>>>>>> If the /tmp/lucene directory does not exist, LARQ will index what 
>>>>>> you have in
>>>>>> /tmp/tdb creating the appropriate Lucene indexes.
>>>>>>
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> LARQ when you point it at a non existing directory will perform 
>>>>>>>> the indexing for you.
>>>>>>>> This is particularly useful when you have multiple datasets 
>>>>>>>> configured in Fuseki.
>>>>>>>> WARNING: it might take a while to index large datasets, so be 
>>>>>>>> patient.
>>>>>>>>
>>>>>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>>>>>
>>>>>>>> LARQ snapshots are here:
>>>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>>>>>
>>>>>>>> and I can quickly fix/improve things if you have problems or 
>>>>>>>> good suggestions.
>>>>>>>>
>>>>>>>> I hope this helps, let me know how it goes.
>>>>>>>>
>>>>>>>> Paolo
>>>>>>>>
>>>>>>>> Jérôme wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>>>>>
>>>>>>>>> I would like to programmaticaly indexing(with lucene) documents 
>>>>>>>>> when the
>>>>>>>>> server starts.
>>>>>>>>>
>>>>>>>>> Something like that:
>>>>>>>>>
>>>>>>>>> Model model = ModelFactory.createDefaultModel();
>>>>>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>>>>>> model.register(larqBuilder);
>>>>>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>>>>>> larqBuilder.closeWriter();
>>>>>>>>> model.unregister(larqBuilder);
>>>>>>>>> index = larqBuilder.getIndex();
>>>>>>>>> LARQ.setDefaultIndex(index);
>>>>>>>>>
>>>>>>>>> Is it possible? In which class it would be the best?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Jerome
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Jérôme <je...@unicaen.fr>.

Le 12/09/11 17:52, Paolo Castagna a écrit :
> Jérôme wrote:
>> Le 12/09/11 16:13, Paolo Castagna a écrit :
>>> Jérôme wrote:
>>>> Le 12/09/11 15:18, Paolo Castagna a écrit :
>>>>> Jérôme wrote:
>>>>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>>>>> Hi Jérôme,
>>>>>>> you are lucky, I've just exactly the same need as you and I've 
>>>>>>> something about it recently.
>>>>>>> Unfortunately, the new LARQ (as a separate module) still did not 
>>>>>>> make it into Fuseki on trunk.
>>>>>>>
>>>>>>> We have an open JIRA for it which you can watch|vote|contribute to:
>>>>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>>>>
>>>>>>> In the meantime, if you want to use LARQ with Fuseki this is 
>>>>>>> what you need to do:
>>>>>>>
>>>>>>> cd /tmp
>>>>>>> svn co 
>>>>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>>>>> fuseki
>>>>>>> cd /tmp/fuseki
>>>>>>> wget 
>>>>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>>>>
>>>>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>>>>> mvn package
>>>>>>>
>>>>>>> Now, you can simply use the Fuseki config.ttl file as explained 
>>>>>>> here:
>>>>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>>>>> and use the ja:textIndex property on a dataset to specify an non 
>>>>>>> existing directory.
>>>>>> Is it possible to have a fuseki configuration example with a 
>>>>>> ja:textIndex property? I am trying to
>>>>>> add it on the book service (books.ttl) with no results...
>>>>>
>>>>>
>>>>> Use tdbloader to load some RDF data into /tmp/tdb, then change 
>>>>> <#dataset>
>>>>> on the example config.ttl file you have in Fuseki:
>>>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>>>>
>>>>
>>>> I've never used the TDB loader - How does it work? Is there an 
>>>> on-line documentation?
>>>
>>> Fortunately, TDB is included in Fuseki uber jar (since it includes 
>>> Fuseki
>>> binaries as well as all the jar dependencies, including TDB). So, in 
>>> this
>>> case, for an end-users it's quite useful.
>>>
>>> Here is what I do:
>>>
>>> cd /tmp/fuseki
>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>>> --loc=/tmp/tdb books.ttl
>>
>> Thank you! It's ok for that!
>
> Good.
>
> So, are you able to query your RDF data using the pf:textMatch property
> function? For example:
>
> PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
> PREFIX dc: <http://purl.org/dc/elements/1.1/>
> DESCRIBE ?doc {
>  ?title pf:textMatch 'potter' .
>  ?doc dc:title ?title .
> } LIMIT 10

I thought, but it doesn't work...
There is no error, but my resultSet is empty.
It's ok for Sparql queries, not for LARQ ones.

How to be sure that my document is well indexed?

I've ran:
java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
--loc=/tmp/tdb books.ttl

My end config.ttl file contains:
<#dataset> rdf:type      tdb:DatasetTDB ;
     tdb:location "/tmp/tdb" ;
     ja:textIndex "/tmp/lucene" ;
     # Query timeout on this dataset (milliseconds)
     ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
##      tdb:unionDefaultGraph true ;


I run Fuseki with this command line:
./fuseki-server --config=config.ttl
>
>> Now i would like to add modifications in the larq module.
>
> LARQ is open source and you are free and welcome to do so if you 
> want/need:
>
> cd /tmp
> svn co 
> https://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/ larq
> cd /tmp/larq
> ... make your changes ...
> mvn install
>
> Using mvn install Maven will install LARQ artifacts in your local 
> Maven repository in your home directory.
>
> However, it would be good if you could share what are your modifications,
> why you need them and your use case. Your changes might be useful to 
> others.
>
> If your changes do not get contributed back, you will need to maintain 
> them
> and they will represent a cost for you. Every time we release a new 
> version
> of LARQ with features you might want, you will need to re-apply your 
> changes.
Yes i know.
>
> So, I encourage you to share them and maybe open a new JIRA issue (with a
> patch attached to it). Not all changes are general and useful enough 
> to get
> committed, but let's see.

Ok - it could be a good solution for everybody.
>
>> I've downloaded and built it. How can i re compile my Fuseki maven 
>> project using my own larq jar?
>
> Once you have published your modified version of LARQ in your local Maven
> repository, it is available to other projects on your machine.
>
> This is how you recompile Fuseki using your modified LARQ jar:
>
> cd /tmp
> svn co 
> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
> fuseki
> cd /tmp/fuseki
> wget 
> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>
> patch -p0<  JENA-63_Fuseki_r1136050.patch
> mvn package
>
> Make sure LARQ versions in your LARQ pom.xml and Fuseki pom.xml 
> correspond.
>
> Once again, if you get your changes adopted and committed to trunk you 
> would
> not need to do all this.
>
> Is it explain() the feature you desperately need?
Not really - finally, explain is not what i'm expecting.
>
> Could you share more on why you need it and what is your use case?
The aim is to querying (from an html form) an RDF graph where not are 
documents (text documents).
We would like to obtain matching documents (availables from lucene Hits) 
+ a list of matching String + a list of offset.
Offsets/Positions will be sent to another module that needs the 
positions to perform its tasks.

Example of query (case of a LARQ query, queries could be SPARQL only):
Query: w*

doc1: when I was a child I was a Jedi

Expected result:
Doc id: 1
Matching strings: when, was, was
Offsets: 0-3; 7-9;21-23


Do you think it would be interesting to share it?

Jérôme
>
> explain() can be expensive and it is Lucene specific, if could cause
> problems if in the future we want to support/move/change and use Solr
> and/or ElasticSearch: https://issues.apache.org/jira/browse/JENA-17.
>
> Paolo
>
>>
>> Thanks.
>>
>>>
>>> This will load the data in books.ttl and build the TDB indexes in 
>>> /tmp/tdb
>>>
>>> You can also use the -h option for help:
>>>
>>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
>>> tdbloader [--desc DATASET | -loc DIR] FILE ...
>>>   Location
>>>       --loc=DIR              Location (a directory)
>>>       --tdb=                 Assembler description file
>>>   Symbol definition
>>>       --set                  Set a configuration symbol to a value
>>>       --strict               Operate in strict SPARQL mode (no 
>>> extensions of any kind)
>>>       --graph=IRI            Act on a named graph
>>>       --desc=                Assembler description file
>>>   General
>>>       -v   --verbose         Verbose
>>>       -q   --quiet           Run with minimal output
>>>       --debug                Output information for debugging
>>>       --help
>>>       --version              Version information
>>>
>>>
>>> Paolo
>>>
>>>> Thanks
>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>>     tdb:location "/tmp/tdb" ;
>>>>>     ja:textIndex "/tmp/lucene" ;
>>>>>     .
>>>>>
>>>>> If the /tmp/lucene directory does not exist, LARQ will index what 
>>>>> you have in
>>>>> /tmp/tdb creating the appropriate Lucene indexes.
>>>>>
>>>>>
>>>>> Paolo
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>> LARQ when you point it at a non existing directory will perform 
>>>>>>> the indexing for you.
>>>>>>> This is particularly useful when you have multiple datasets 
>>>>>>> configured in Fuseki.
>>>>>>> WARNING: it might take a while to index large datasets, so be 
>>>>>>> patient.
>>>>>>>
>>>>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>>>>
>>>>>>> LARQ snapshots are here:
>>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>>>>
>>>>>>> and I can quickly fix/improve things if you have problems or 
>>>>>>> good suggestions.
>>>>>>>
>>>>>>> I hope this helps, let me know how it goes.
>>>>>>>
>>>>>>> Paolo
>>>>>>>
>>>>>>> Jérôme wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>>>>
>>>>>>>> I would like to programmaticaly indexing(with lucene) documents 
>>>>>>>> when the
>>>>>>>> server starts.
>>>>>>>>
>>>>>>>> Something like that:
>>>>>>>>
>>>>>>>> Model model = ModelFactory.createDefaultModel();
>>>>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>>>>> model.register(larqBuilder);
>>>>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>>>>> larqBuilder.closeWriter();
>>>>>>>> model.unregister(larqBuilder);
>>>>>>>> index = larqBuilder.getIndex();
>>>>>>>> LARQ.setDefaultIndex(index);
>>>>>>>>
>>>>>>>> Is it possible? In which class it would be the best?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Jerome
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Jérôme wrote:
> Le 12/09/11 16:13, Paolo Castagna a écrit :
>> Jérôme wrote:
>>> Le 12/09/11 15:18, Paolo Castagna a écrit :
>>>> Jérôme wrote:
>>>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>>>> Hi Jérôme,
>>>>>> you are lucky, I've just exactly the same need as you and I've 
>>>>>> something about it recently.
>>>>>> Unfortunately, the new LARQ (as a separate module) still did not 
>>>>>> make it into Fuseki on trunk.
>>>>>>
>>>>>> We have an open JIRA for it which you can watch|vote|contribute to:
>>>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>>>
>>>>>> In the meantime, if you want to use LARQ with Fuseki this is what 
>>>>>> you need to do:
>>>>>>
>>>>>> cd /tmp
>>>>>> svn co 
>>>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>>>> fuseki
>>>>>> cd /tmp/fuseki
>>>>>> wget 
>>>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>>>
>>>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>>>> mvn package
>>>>>>
>>>>>> Now, you can simply use the Fuseki config.ttl file as explained here:
>>>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>>>> and use the ja:textIndex property on a dataset to specify an non 
>>>>>> existing directory.
>>>>> Is it possible to have a fuseki configuration example with a 
>>>>> ja:textIndex property? I am trying to
>>>>> add it on the book service (books.ttl) with no results...
>>>>
>>>>
>>>> Use tdbloader to load some RDF data into /tmp/tdb, then change 
>>>> <#dataset>
>>>> on the example config.ttl file you have in Fuseki:
>>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>>>
>>>
>>> I've never used the TDB loader - How does it work? Is there an 
>>> on-line documentation?
>>
>> Fortunately, TDB is included in Fuseki uber jar (since it includes Fuseki
>> binaries as well as all the jar dependencies, including TDB). So, in this
>> case, for an end-users it's quite useful.
>>
>> Here is what I do:
>>
>> cd /tmp/fuseki
>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
>> --loc=/tmp/tdb books.ttl
> 
> Thank you! It's ok for that!

Good.

So, are you able to query your RDF data using the pf:textMatch property
function? For example:

PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
DESCRIBE ?doc {
  ?title pf:textMatch 'potter' .
  ?doc dc:title ?title .
} LIMIT 10

> Now i would like to add modifications in the larq module.

LARQ is open source and you are free and welcome to do so if you want/need:

cd /tmp
svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/ larq
cd /tmp/larq
... make your changes ...
mvn install

Using mvn install Maven will install LARQ artifacts in your local Maven 
repository in your home directory.

However, it would be good if you could share what are your modifications,
why you need them and your use case. Your changes might be useful to others.

If your changes do not get contributed back, you will need to maintain them
and they will represent a cost for you. Every time we release a new version
of LARQ with features you might want, you will need to re-apply your changes.

So, I encourage you to share them and maybe open a new JIRA issue (with a
patch attached to it). Not all changes are general and useful enough to get
committed, but let's see.

> I've downloaded and built it. How can i re compile my Fuseki maven 
> project using my own larq jar?

Once you have published your modified version of LARQ in your local Maven
repository, it is available to other projects on your machine.

This is how you recompile Fuseki using your modified LARQ jar:

cd /tmp
svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
cd /tmp/fuseki
wget 
https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 

patch -p0<  JENA-63_Fuseki_r1136050.patch
mvn package

Make sure LARQ versions in your LARQ pom.xml and Fuseki pom.xml correspond.

Once again, if you get your changes adopted and committed to trunk you would
not need to do all this.

Is it explain() the feature you desperately need?

Could you share more on why you need it and what is your use case?

explain() can be expensive and it is Lucene specific, if could cause
problems if in the future we want to support/move/change and use Solr
and/or ElasticSearch: https://issues.apache.org/jira/browse/JENA-17.

Paolo

> 
> Thanks.
> 
>>
>> This will load the data in books.ttl and build the TDB indexes in 
>> /tmp/tdb
>>
>> You can also use the -h option for help:
>>
>> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
>> tdbloader [--desc DATASET | -loc DIR] FILE ...
>>   Location
>>       --loc=DIR              Location (a directory)
>>       --tdb=                 Assembler description file
>>   Symbol definition
>>       --set                  Set a configuration symbol to a value
>>       --strict               Operate in strict SPARQL mode (no 
>> extensions of any kind)
>>       --graph=IRI            Act on a named graph
>>       --desc=                Assembler description file
>>   General
>>       -v   --verbose         Verbose
>>       -q   --quiet           Run with minimal output
>>       --debug                Output information for debugging
>>       --help
>>       --version              Version information
>>
>>
>> Paolo
>>
>>> Thanks
>>>>
>>>>
>>>> [...]
>>>>
>>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>>     tdb:location "/tmp/tdb" ;
>>>>     ja:textIndex "/tmp/lucene" ;
>>>>     .
>>>>
>>>> If the /tmp/lucene directory does not exist, LARQ will index what 
>>>> you have in
>>>> /tmp/tdb creating the appropriate Lucene indexes.
>>>>
>>>>
>>>> Paolo
>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>> LARQ when you point it at a non existing directory will perform 
>>>>>> the indexing for you.
>>>>>> This is particularly useful when you have multiple datasets 
>>>>>> configured in Fuseki.
>>>>>> WARNING: it might take a while to index large datasets, so be 
>>>>>> patient.
>>>>>>
>>>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>>>
>>>>>> LARQ snapshots are here:
>>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>>>
>>>>>> and I can quickly fix/improve things if you have problems or good 
>>>>>> suggestions.
>>>>>>
>>>>>> I hope this helps, let me know how it goes.
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>> Jérôme wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>>>
>>>>>>> I would like to programmaticaly indexing(with lucene) documents 
>>>>>>> when the
>>>>>>> server starts.
>>>>>>>
>>>>>>> Something like that:
>>>>>>>
>>>>>>> Model model = ModelFactory.createDefaultModel();
>>>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>>>> model.register(larqBuilder);
>>>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>>>> larqBuilder.closeWriter();
>>>>>>> model.unregister(larqBuilder);
>>>>>>> index = larqBuilder.getIndex();
>>>>>>> LARQ.setDefaultIndex(index);
>>>>>>>
>>>>>>> Is it possible? In which class it would be the best?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Jerome
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Jérôme <je...@unicaen.fr>.

Le 12/09/11 16:13, Paolo Castagna a écrit :
> Jérôme wrote:
>> Le 12/09/11 15:18, Paolo Castagna a écrit :
>>> Jérôme wrote:
>>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>>> Hi Jérôme,
>>>>> you are lucky, I've just exactly the same need as you and I've 
>>>>> something about it recently.
>>>>> Unfortunately, the new LARQ (as a separate module) still did not 
>>>>> make it into Fuseki on trunk.
>>>>>
>>>>> We have an open JIRA for it which you can watch|vote|contribute to:
>>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>>
>>>>> In the meantime, if you want to use LARQ with Fuseki this is what 
>>>>> you need to do:
>>>>>
>>>>> cd /tmp
>>>>> svn co 
>>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki 
>>>>>
>>>>> cd /tmp/fuseki
>>>>> wget 
>>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>>
>>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>>> mvn package
>>>>>
>>>>> Now, you can simply use the Fuseki config.ttl file as explained here:
>>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>>> and use the ja:textIndex property on a dataset to specify an non 
>>>>> existing directory.
>>>> Is it possible to have a fuseki configuration example with a 
>>>> ja:textIndex property? I am trying to
>>>> add it on the book service (books.ttl) with no results...
>>>
>>>
>>> Use tdbloader to load some RDF data into /tmp/tdb, then change 
>>> <#dataset>
>>> on the example config.ttl file you have in Fuseki:
>>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>>
>>
>> I've never used the TDB loader - How does it work? Is there an 
>> on-line documentation?
>
> Fortunately, TDB is included in Fuseki uber jar (since it includes Fuseki
> binaries as well as all the jar dependencies, including TDB). So, in this
> case, for an end-users it's quite useful.
>
> Here is what I do:
>
> cd /tmp/fuseki
> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader 
> --loc=/tmp/tdb books.ttl

Thank you! It's ok for that!
Now i would like to add modifications in the larq module.

I've downloaded and built it. How can i re compile my Fuseki maven 
project using my own larq jar?

Thanks.

>
> This will load the data in books.ttl and build the TDB indexes in 
> /tmp/tdb
>
> You can also use the -h option for help:
>
> java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
> tdbloader [--desc DATASET | -loc DIR] FILE ...
>   Location
>       --loc=DIR              Location (a directory)
>       --tdb=                 Assembler description file
>   Symbol definition
>       --set                  Set a configuration symbol to a value
>       --strict               Operate in strict SPARQL mode (no 
> extensions of any kind)
>       --graph=IRI            Act on a named graph
>       --desc=                Assembler description file
>   General
>       -v   --verbose         Verbose
>       -q   --quiet           Run with minimal output
>       --debug                Output information for debugging
>       --help
>       --version              Version information
>
>
> Paolo
>
>> Thanks
>>>
>>>
>>> [...]
>>>
>>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>>     tdb:location "/tmp/tdb" ;
>>>     ja:textIndex "/tmp/lucene" ;
>>>     .
>>>
>>> If the /tmp/lucene directory does not exist, LARQ will index what 
>>> you have in
>>> /tmp/tdb creating the appropriate Lucene indexes.
>>>
>>>
>>> Paolo
>>>
>>>>
>>>> Thanks
>>>>
>>>>> LARQ when you point it at a non existing directory will perform 
>>>>> the indexing for you.
>>>>> This is particularly useful when you have multiple datasets 
>>>>> configured in Fuseki.
>>>>> WARNING: it might take a while to index large datasets, so be 
>>>>> patient.
>>>>>
>>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>>
>>>>> LARQ snapshots are here:
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>>
>>>>> and I can quickly fix/improve things if you have problems or good 
>>>>> suggestions.
>>>>>
>>>>> I hope this helps, let me know how it goes.
>>>>>
>>>>> Paolo
>>>>>
>>>>> Jérôme wrote:
>>>>>> Hi,
>>>>>>
>>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>>
>>>>>> I would like to programmaticaly indexing(with lucene) documents 
>>>>>> when the
>>>>>> server starts.
>>>>>>
>>>>>> Something like that:
>>>>>>
>>>>>> Model model = ModelFactory.createDefaultModel();
>>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>>> model.register(larqBuilder);
>>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>>> larqBuilder.closeWriter();
>>>>>> model.unregister(larqBuilder);
>>>>>> index = larqBuilder.getIndex();
>>>>>> LARQ.setDefaultIndex(index);
>>>>>>
>>>>>> Is it possible? In which class it would be the best?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Jerome
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Jérôme wrote:
> Le 12/09/11 15:18, Paolo Castagna a écrit :
>> Jérôme wrote:
>>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>>> Hi Jérôme,
>>>> you are lucky, I've just exactly the same need as you and I've 
>>>> something about it recently.
>>>> Unfortunately, the new LARQ (as a separate module) still did not 
>>>> make it into Fuseki on trunk.
>>>>
>>>> We have an open JIRA for it which you can watch|vote|contribute to:
>>>> https://issues.apache.org/jira/browse/JENA-63
>>>>
>>>> In the meantime, if you want to use LARQ with Fuseki this is what 
>>>> you need to do:
>>>>
>>>> cd /tmp
>>>> svn co 
>>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>>> fuseki
>>>> cd /tmp/fuseki
>>>> wget 
>>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>>
>>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>>> mvn package
>>>>
>>>> Now, you can simply use the Fuseki config.ttl file as explained here:
>>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>>> and use the ja:textIndex property on a dataset to specify an non 
>>>> existing directory.
>>> Is it possible to have a fuseki configuration example with a 
>>> ja:textIndex property? I am trying to
>>> add it on the book service (books.ttl) with no results...
>>
>>
>> Use tdbloader to load some RDF data into /tmp/tdb, then change <#dataset>
>> on the example config.ttl file you have in Fuseki:
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>>
> 
> I've never used the TDB loader - How does it work? Is there an on-line 
> documentation?

Fortunately, TDB is included in Fuseki uber jar (since it includes Fuseki
binaries as well as all the jar dependencies, including TDB). So, in this
case, for an end-users it's quite useful.

Here is what I do:

cd /tmp/fuseki
java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader --loc=/tmp/tdb books.ttl

This will load the data in books.ttl and build the TDB indexes in /tmp/tdb

You can also use the -h option for help:

java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
tdbloader [--desc DATASET | -loc DIR] FILE ...
   Location
       --loc=DIR              Location (a directory)
       --tdb=                 Assembler description file
   Symbol definition
       --set                  Set a configuration symbol to a value
       --strict               Operate in strict SPARQL mode (no extensions of 
any kind)
       --graph=IRI            Act on a named graph
       --desc=                Assembler description file
   General
       -v   --verbose         Verbose
       -q   --quiet           Run with minimal output
       --debug                Output information for debugging
       --help
       --version              Version information


Paolo

> Thanks
>>
>>
>> [...]
>>
>> <#dataset> rdf:type      tdb:DatasetTDB ;
>>     tdb:location "/tmp/tdb" ;
>>     ja:textIndex "/tmp/lucene" ;
>>     .
>>
>> If the /tmp/lucene directory does not exist, LARQ will index what you 
>> have in
>> /tmp/tdb creating the appropriate Lucene indexes.
>>
>>
>> Paolo
>>
>>>
>>> Thanks
>>>
>>>> LARQ when you point it at a non existing directory will perform the 
>>>> indexing for you.
>>>> This is particularly useful when you have multiple datasets 
>>>> configured in Fuseki.
>>>> WARNING: it might take a while to index large datasets, so be patient.
>>>>
>>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>>
>>>> LARQ snapshots are here:
>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>>
>>>> and I can quickly fix/improve things if you have problems or good 
>>>> suggestions.
>>>>
>>>> I hope this helps, let me know how it goes.
>>>>
>>>> Paolo
>>>>
>>>> Jérôme wrote:
>>>>> Hi,
>>>>>
>>>>> i'm trying to use LARQ with my Fuseki server.
>>>>>
>>>>> I would like to programmaticaly indexing(with lucene) documents 
>>>>> when the
>>>>> server starts.
>>>>>
>>>>> Something like that:
>>>>>
>>>>> Model model = ModelFactory.createDefaultModel();
>>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>>> model.register(larqBuilder);
>>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>>> larqBuilder.closeWriter();
>>>>> model.unregister(larqBuilder);
>>>>> index = larqBuilder.getIndex();
>>>>> LARQ.setDefaultIndex(index);
>>>>>
>>>>> Is it possible? In which class it would be the best?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Jerome
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Jérôme <je...@unicaen.fr>.

Le 12/09/11 15:18, Paolo Castagna a écrit :
> Jérôme wrote:
>> Le 12/09/11 12:24, Paolo Castagna a écrit :
>>> Hi Jérôme,
>>> you are lucky, I've just exactly the same need as you and I've 
>>> something about it recently.
>>> Unfortunately, the new LARQ (as a separate module) still did not 
>>> make it into Fuseki on trunk.
>>>
>>> We have an open JIRA for it which you can watch|vote|contribute to:
>>> https://issues.apache.org/jira/browse/JENA-63
>>>
>>> In the meantime, if you want to use LARQ with Fuseki this is what 
>>> you need to do:
>>>
>>> cd /tmp
>>> svn co 
>>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>>> fuseki
>>> cd /tmp/fuseki
>>> wget 
>>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>>
>>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>>> mvn package
>>>
>>> Now, you can simply use the Fuseki config.ttl file as explained here:
>>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>>> and use the ja:textIndex property on a dataset to specify an non 
>>> existing directory.
>> Is it possible to have a fuseki configuration example with a 
>> ja:textIndex property? I am trying to
>> add it on the book service (books.ttl) with no results...
>
>
> Use tdbloader to load some RDF data into /tmp/tdb, then change <#dataset>
> on the example config.ttl file you have in Fuseki:
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl 
>

I've never used the TDB loader - How does it work? Is there an on-line 
documentation?
Thanks
>
>
> [...]
>
> <#dataset> rdf:type      tdb:DatasetTDB ;
>     tdb:location "/tmp/tdb" ;
>     ja:textIndex "/tmp/lucene" ;
>     .
>
> If the /tmp/lucene directory does not exist, LARQ will index what you 
> have in
> /tmp/tdb creating the appropriate Lucene indexes.
>
>
> Paolo
>
>>
>> Thanks
>>
>>> LARQ when you point it at a non existing directory will perform the 
>>> indexing for you.
>>> This is particularly useful when you have multiple datasets 
>>> configured in Fuseki.
>>> WARNING: it might take a while to index large datasets, so be patient.
>>>
>>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>>
>>> LARQ snapshots are here:
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>>
>>> and I can quickly fix/improve things if you have problems or good 
>>> suggestions.
>>>
>>> I hope this helps, let me know how it goes.
>>>
>>> Paolo
>>>
>>> Jérôme wrote:
>>>> Hi,
>>>>
>>>> i'm trying to use LARQ with my Fuseki server.
>>>>
>>>> I would like to programmaticaly indexing(with lucene) documents 
>>>> when the
>>>> server starts.
>>>>
>>>> Something like that:
>>>>
>>>> Model model = ModelFactory.createDefaultModel();
>>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>>> model.register(larqBuilder);
>>>> FileManager.get().readModel(model, "Data/books.ttl");
>>>> larqBuilder.closeWriter();
>>>> model.unregister(larqBuilder);
>>>> index = larqBuilder.getIndex();
>>>> LARQ.setDefaultIndex(index);
>>>>
>>>> Is it possible? In which class it would be the best?
>>>>
>>>> Thanks
>>>>
>>>> Jerome
>>>>
>>>>
>>>>
>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Jérôme wrote:
> Le 12/09/11 12:24, Paolo Castagna a écrit :
>> Hi Jérôme,
>> you are lucky, I've just exactly the same need as you and I've 
>> something about it recently.
>> Unfortunately, the new LARQ (as a separate module) still did not make 
>> it into Fuseki on trunk.
>>
>> We have an open JIRA for it which you can watch|vote|contribute to:
>> https://issues.apache.org/jira/browse/JENA-63
>>
>> In the meantime, if you want to use LARQ with Fuseki this is what you 
>> need to do:
>>
>> cd /tmp
>> svn co 
>> https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ 
>> fuseki
>> cd /tmp/fuseki
>> wget 
>> https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch 
>>
>> patch -p0<  JENA-63_Fuseki_r1136050.patch
>> mvn package
>>
>> Now, you can simply use the Fuseki config.ttl file as explained here:
>> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
>> and use the ja:textIndex property on a dataset to specify an non 
>> existing directory.
> Is it possible to have a fuseki configuration example with a 
> ja:textIndex property? I am trying to
> add it on the book service (books.ttl) with no results...


Use tdbloader to load some RDF data into /tmp/tdb, then change <#dataset>
on the example config.ttl file you have in Fuseki:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl


[...]

<#dataset> rdf:type      tdb:DatasetTDB ;
     tdb:location "/tmp/tdb" ;
     ja:textIndex "/tmp/lucene" ;
     .

If the /tmp/lucene directory does not exist, LARQ will index what you have in
/tmp/tdb creating the appropriate Lucene indexes.


Paolo

> 
> Thanks
> 
>> LARQ when you point it at a non existing directory will perform the 
>> indexing for you.
>> This is particularly useful when you have multiple datasets configured 
>> in Fuseki.
>> WARNING: it might take a while to index large datasets, so be patient.
>>
>> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>>
>> LARQ snapshots are here:
>> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ 
>>
>> and I can quickly fix/improve things if you have problems or good 
>> suggestions.
>>
>> I hope this helps, let me know how it goes.
>>
>> Paolo
>>
>> Jérôme wrote:
>>> Hi,
>>>
>>> i'm trying to use LARQ with my Fuseki server.
>>>
>>> I would like to programmaticaly indexing(with lucene) documents when the
>>> server starts.
>>>
>>> Something like that:
>>>
>>> Model model = ModelFactory.createDefaultModel();
>>> IndexBuilderString larqBuilder = new IndexBuilderString();
>>> model.register(larqBuilder);
>>> FileManager.get().readModel(model, "Data/books.ttl");
>>> larqBuilder.closeWriter();
>>> model.unregister(larqBuilder);
>>> index = larqBuilder.getIndex();
>>> LARQ.setDefaultIndex(index);
>>>
>>> Is it possible? In which class it would be the best?
>>>
>>> Thanks
>>>
>>> Jerome
>>>
>>>
>>>
>

Re: Fuseki + Larq : Lucene indexing

Posted by Jérôme <je...@unicaen.fr>.

Le 12/09/11 12:24, Paolo Castagna a écrit :
> Hi Jérôme,
> you are lucky, I've just exactly the same need as you and I've something about it recently.
> Unfortunately, the new LARQ (as a separate module) still did not make it into Fuseki on trunk.
>
> We have an open JIRA for it which you can watch|vote|contribute to:
> https://issues.apache.org/jira/browse/JENA-63
>
> In the meantime, if you want to use LARQ with Fuseki this is what you need to do:
>
> cd /tmp
> svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
> cd /tmp/fuseki
> wget https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
> patch -p0<  JENA-63_Fuseki_r1136050.patch
> mvn package
>
> Now, you can simply use the Fuseki config.ttl file as explained here:
> http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
> and use the ja:textIndex property on a dataset to specify an non existing directory.
Is it possible to have a fuseki configuration example with a 
ja:textIndex property? I am trying to
add it on the book service (books.ttl) with no results...

Thanks

> LARQ when you point it at a non existing directory will perform the indexing for you.
> This is particularly useful when you have multiple datasets configured in Fuseki.
> WARNING: it might take a while to index large datasets, so be patient.
>
> See also: http://markmail.org/thread/tmptip55ru5wxrrj
>
> LARQ snapshots are here:
> https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/
> and I can quickly fix/improve things if you have problems or good suggestions.
>
> I hope this helps, let me know how it goes.
>
> Paolo
>
> Jérôme wrote:
>> Hi,
>>
>> i'm trying to use LARQ with my Fuseki server.
>>
>> I would like to programmaticaly indexing(with lucene) documents when the
>> server starts.
>>
>> Something like that:
>>
>> Model model = ModelFactory.createDefaultModel();
>> IndexBuilderString larqBuilder = new IndexBuilderString();
>> model.register(larqBuilder);
>> FileManager.get().readModel(model, "Data/books.ttl");
>> larqBuilder.closeWriter();
>> model.unregister(larqBuilder);
>> index = larqBuilder.getIndex();
>> LARQ.setDefaultIndex(index);
>>
>> Is it possible? In which class it would be the best?
>>
>> Thanks
>>
>> Jerome
>>
>>
>>

Re: Fuseki + Larq : Lucene indexing

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Jérôme,
you are lucky, I've just exactly the same need as you and I've something about it recently.
Unfortunately, the new LARQ (as a separate module) still did not make it into Fuseki on trunk.

We have an open JIRA for it which you can watch|vote|contribute to:
https://issues.apache.org/jira/browse/JENA-63

In the meantime, if you want to use LARQ with Fuseki this is what you need to do:

cd /tmp
svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
cd /tmp/fuseki
wget https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0 < JENA-63_Fuseki_r1136050.patch
mvn package

Now, you can simply use the Fuseki config.ttl file as explained here:
http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
and use the ja:textIndex property on a dataset to specify an non existing directory.

LARQ when you point it at a non existing directory will perform the indexing for you.
This is particularly useful when you have multiple datasets configured in Fuseki.
WARNING: it might take a while to index large datasets, so be patient.

See also: http://markmail.org/thread/tmptip55ru5wxrrj

LARQ snapshots are here:
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/
and I can quickly fix/improve things if you have problems or good suggestions.

I hope this helps, let me know how it goes.

Paolo

Jérôme wrote:
> Hi,
> 
> i'm trying to use LARQ with my Fuseki server.
> 
> I would like to programmaticaly indexing(with lucene) documents when the
> server starts.
> 
> Something like that:
> 
> Model model = ModelFactory.createDefaultModel();
> IndexBuilderString larqBuilder = new IndexBuilderString();
> model.register(larqBuilder);
> FileManager.get().readModel(model, "Data/books.ttl");
> larqBuilder.closeWriter();
> model.unregister(larqBuilder);
> index = larqBuilder.getIndex();
> LARQ.setDefaultIndex(index);
> 
> Is it possible? In which class it would be the best?
> 
> Thanks
> 
> Jerome
> 
> 
>