You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Nikolaos Abatzis <Ni...@newriversystems.com> on 2011/03/16 19:20:43 UTC

ARQ + Lucene

All:

I just did a check on the 2.6.4 Jena bundle and I see that even the latest ARQ (2.8.7) seems to use the lucene-core-2.3.1.

Are there any plans to port ARQ so that it uses a more recent version of Lucene. 2.3.1 is kind of ancient, current in the 2.x series is 2.9.4
and my understanding is that it uses Java 1.4. There is also a 3.x series that uses Java 1.5. per the lucene docs, the latest versions offer 
significant enhancements and have fixed memory leaks, etc...

Please do not hesitate to contact me if you need additional information. Thank you.

Regards,

Nikolaos Abatzis
New River Systems Corporation
1890 Preston White Drive, Suite 240
Reston, VA 20191

RE: ARQ + Lucene

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.
Sorry unintentional misuse of words

very much in development

________________________________________
From: Paolo Castagna [castagna.lists@googlemail.com]
Sent: 17 March 2011 12:27
To: jena-users@incubator.apache.org
Subject: Re: ARQ + Lucene

McGibbney, Lewis John wrote:
> OK Thank you for pointing out this information.
>
> My work with Solr is not formally associated with the project, it is merely part of my ongoing research and is very much in production.

Very much in development or is it used in production systems?

Paolo

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: ARQ + Lucene

Posted by Paolo Castagna <ca...@googlemail.com>.
McGibbney, Lewis John wrote:
> OK Thank you for pointing out this information.
> 
> My work with Solr is not formally associated with the project, it is merely part of my ongoing research and is very much in production. 

Very much in development or is it used in production systems?

Paolo

 > To give you an indication I provide the link below which shows a previous 
effort, this was used to provide query refinement based on data stored in a 
Lucene index (which is now removed from the Nutch project as indexing and search 
has been delegated to Solr)
> 
> http://wiki.apache.org/nutch/OntologyPlugin
> 
> When I have something more stable I will gladly get in touch and make the code available for everyone.
> 
> Lewis
> ________________________________________
> From: Paolo Castagna [castagna.lists@googlemail.com]
> Sent: 17 March 2011 08:14
> To: jena-users@incubator.apache.org
> Subject: Re: ARQ + Lucene
> 
> McGibbney, Lewis John wrote:
>> Hi Paolo,
>>
>> First and foremost I apologise upfront for talking off-subject here in regards to Jena in specific.
>>
>> I am currently working on a plug-in for Solr which refines user queries based on ontology classes and I am using Jena to wrap around my ontology models to provide this functionality.
>> Immediately I am interested in experimenting with LARQ and SARQ.
>>
>> Out of curiosity, can you please provide some use case or personal usage for these frameworks as it would enable me to get at least an abstract sense of how I may be able to use LARQ,
>> prior to SARQ for my own needs.
> 
> The use cases for SARQ or EARQ are exactly the same as the ones for LARQ (since they
> implement exactly same functionalities but with a different indexing solution behind).
> 
> One use case could be: you want to quickly find, for example, all the "things" which
> have the word "nuclear energy" in any of their literals.
> 
> Other RDF stores implement similar functionalities, see:
> http://www.w3.org/wiki/SPARQL/Extensions/Computed_Properties
> 
> This is just pure and plain vanilla free text searches: given some keywords, you retrieve
> a ranked list of the first X literals containing those keywords/words. It's not what some
> call "semantic search", no ontologies are involved, no NLP, etc.
> 
> Can you point me at the Solr plug-in you are working on?
> 
> Thanks,
> Paolo
> 
> Glasgow Caledonian University is a registered Scottish charity, number SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


RE: ARQ + Lucene

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.
OK Thank you for pointing out this information.

My work with Solr is not formally associated with the project, it is merely part of my ongoing research and is very much in production. To give you an indication I provide the link below which shows a previous effort, this was used to provide query refinement based on data stored in a Lucene index (which is now removed from the Nutch project as indexing and search has been delegated to Solr)

http://wiki.apache.org/nutch/OntologyPlugin

When I have something more stable I will gladly get in touch and make the code available for everyone.

Lewis
________________________________________
From: Paolo Castagna [castagna.lists@googlemail.com]
Sent: 17 March 2011 08:14
To: jena-users@incubator.apache.org
Subject: Re: ARQ + Lucene

McGibbney, Lewis John wrote:
> Hi Paolo,
>
> First and foremost I apologise upfront for talking off-subject here in regards to Jena in specific.
>
> I am currently working on a plug-in for Solr which refines user queries based on ontology classes and I am using Jena to wrap around my ontology models to provide this functionality.
> Immediately I am interested in experimenting with LARQ and SARQ.
>
> Out of curiosity, can you please provide some use case or personal usage for these frameworks as it would enable me to get at least an abstract sense of how I may be able to use LARQ,
> prior to SARQ for my own needs.

The use cases for SARQ or EARQ are exactly the same as the ones for LARQ (since they
implement exactly same functionalities but with a different indexing solution behind).

One use case could be: you want to quickly find, for example, all the "things" which
have the word "nuclear energy" in any of their literals.

Other RDF stores implement similar functionalities, see:
http://www.w3.org/wiki/SPARQL/Extensions/Computed_Properties

This is just pure and plain vanilla free text searches: given some keywords, you retrieve
a ranked list of the first X literals containing those keywords/words. It's not what some
call "semantic search", no ontologies are involved, no NLP, etc.

Can you point me at the Solr plug-in you are working on?

Thanks,
Paolo

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: ARQ + Lucene

Posted by Paolo Castagna <ca...@googlemail.com>.

McGibbney, Lewis John wrote:
> Hi Paolo,
> 
> First and foremost I apologise upfront for talking off-subject here in regards to Jena in specific.
> 
> I am currently working on a plug-in for Solr which refines user queries based on ontology classes and I am using Jena to wrap around my ontology models to provide this functionality.
> Immediately I am interested in experimenting with LARQ and SARQ.
> 
> Out of curiosity, can you please provide some use case or personal usage for these frameworks as it would enable me to get at least an abstract sense of how I may be able to use LARQ,
> prior to SARQ for my own needs.

The use cases for SARQ or EARQ are exactly the same as the ones for LARQ (since they
implement exactly same functionalities but with a different indexing solution behind).

One use case could be: you want to quickly find, for example, all the "things" which
have the word "nuclear energy" in any of their literals.

Other RDF stores implement similar functionalities, see:
http://www.w3.org/wiki/SPARQL/Extensions/Computed_Properties

This is just pure and plain vanilla free text searches: given some keywords, you retrieve
a ranked list of the first X literals containing those keywords/words. It's not what some
call "semantic search", no ontologies are involved, no NLP, etc.

Can you point me at the Solr plug-in you are working on?

Thanks,
Paolo

> 
> Any comments would be great
> Thank you
> Lewis
> ________________________________________
> From: Paolo Castagna [castagna.lists@googlemail.com]
> Sent: 16 March 2011 19:00
> To: jena-users@incubator.apache.org
> Subject: Re: ARQ + Lucene
> 
> Nikolaos Abatzis wrote:
>> All:
>>
>> I just did a check on the 2.6.4 Jena bundle and I see that even the latest ARQ (2.8.7) seems to use the lucene-core-2.3.1.
>>
>> Are there any plans to port ARQ so that it uses a more recent version of Lucene. 2.3.1 is kind of ancient, current in the 2.x series is 2.9.4
>> and my understanding is that it uses Java 1.4. There is also a 3.x series that uses Java 1.5. per the lucene docs, the latest versions offer
>> significant enhancements and have fixed memory leaks, etc...
> 
> "Lucene 2.3.1 is kind of ancient", I agree. :-)
> 
> Yes, the plan is to extract LARQ as a separate module (which will depend on ARQ).
> 
> You can look here: https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
> As you can see, it is using Lucene 3.0.3 (and it is backward compatible with 2.9.x
> ... as just a drop-in replacement if somebody needs an older version of Lucene).
> 
> We are currently waiting the "all clear" to move source code from SourceForge to the
> Apache SVN repository. Once we do that, little changes are necessary in the ARQ
> source code to dynamically look for LARQ and wiring it in (if present in the class
> path).
> 
> The LARQ separate module has already a few improvements (i.e. avoiding duplicates
> with subsequents updates and removals).
> 
> Still pending:
> 
>   - a way to support RDF Datasets in addition to Jena Models and proper assemblers.
>   - a tool to make it easier to build Lucene indexes pointing to a TDB location.
>   - (at one point, I'd like to see LARQ in Fuseki ;-))
> 
> Later:
> 
>   - Solr? https://github.com/castagna/SARQ
>   - ElasticSearch? https://github.com/castagna/EARQ
> 
> To checkout LARQ and compile it type:
> svn co https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/ LARQ
> cd LARQ
> mvn package
> 
> Let me know if you have problems,
> Paolo
> 
>> Please do not hesitate to contact me if you need additional information. Thank you.
>>
>> Regards,
>>
>> Nikolaos Abatzis
>> New River Systems Corporation
>> 1890 Preston White Drive, Suite 240
>> Reston, VA 20191
> 
> Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems
> 
> Glasgow Caledonian University is a registered Scottish charity, number SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

RE: ARQ + Lucene

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.
Hi Paolo,

First and foremost I apologise upfront for talking off-subject here in regards to Jena in specific.

I am currently working on a plug-in for Solr which refines user queries based on ontology classes and I am using Jena to wrap around my ontology models to provide this functionality.
Immediately I am interested in experimenting with LARQ and SARQ.

Out of curiosity, can you please provide some use case or personal usage for these frameworks as it would enable me to get at least an abstract sense of how I may be able to use LARQ,
prior to SARQ for my own needs.

Any comments would be great
Thank you
Lewis
________________________________________
From: Paolo Castagna [castagna.lists@googlemail.com]
Sent: 16 March 2011 19:00
To: jena-users@incubator.apache.org
Subject: Re: ARQ + Lucene

Nikolaos Abatzis wrote:
> All:
>
> I just did a check on the 2.6.4 Jena bundle and I see that even the latest ARQ (2.8.7) seems to use the lucene-core-2.3.1.
>
> Are there any plans to port ARQ so that it uses a more recent version of Lucene. 2.3.1 is kind of ancient, current in the 2.x series is 2.9.4
> and my understanding is that it uses Java 1.4. There is also a 3.x series that uses Java 1.5. per the lucene docs, the latest versions offer
> significant enhancements and have fixed memory leaks, etc...

"Lucene 2.3.1 is kind of ancient", I agree. :-)

Yes, the plan is to extract LARQ as a separate module (which will depend on ARQ).

You can look here: https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
As you can see, it is using Lucene 3.0.3 (and it is backward compatible with 2.9.x
... as just a drop-in replacement if somebody needs an older version of Lucene).

We are currently waiting the "all clear" to move source code from SourceForge to the
Apache SVN repository. Once we do that, little changes are necessary in the ARQ
source code to dynamically look for LARQ and wiring it in (if present in the class
path).

The LARQ separate module has already a few improvements (i.e. avoiding duplicates
with subsequents updates and removals).

Still pending:

  - a way to support RDF Datasets in addition to Jena Models and proper assemblers.
  - a tool to make it easier to build Lucene indexes pointing to a TDB location.
  - (at one point, I'd like to see LARQ in Fuseki ;-))

Later:

  - Solr? https://github.com/castagna/SARQ
  - ElasticSearch? https://github.com/castagna/EARQ

To checkout LARQ and compile it type:
svn co https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/ LARQ
cd LARQ
mvn package

Let me know if you have problems,
Paolo

>
> Please do not hesitate to contact me if you need additional information. Thank you.
>
> Regards,
>
> Nikolaos Abatzis
> New River Systems Corporation
> 1890 Preston White Drive, Suite 240
> Reston, VA 20191

Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: ARQ + Lucene

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Nikolaos,
you can use the new LARQ (the one from the separate module) but you need to wire it into ARQ yourself.

Package has changed but not the class names. See org.apache.jena.larq.LARQ:
https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/src/main/java/org/apache/jena/larq/LARQ.java

LARQ.init() should work.

Then, in your code, be careful to use classes from the org.apache.jena.larq.* package.

I also updated the ARQ dependency pointing to the latest stable release of ARQ.

You can find Maven artifacts here:
http://oss.talisplatform.com/content/repositories/talis-releases/org/apache/jena/larq/
http://oss.talisplatform.com/content/repositories/talis-snapshots/org/apache/jena/larq/

I hope to have have this "fixed" with the next ARQ release or so.

In the meantime, please, give it another try and let me know how it goes.
Testing and feedback from users is really useful.

Paolo


Nikolaos Abatzis wrote:
> Paolo,
> 
> Thanks for all your help. 
> 
> We were preparing for a customer demo, hence my delayed response. I did get LARQ and build it successfuly. Unless I am doing something wrong the problem is that there is no "stripped down" arq and it seems that the classloader picks up the larq classes from the arq jar versus the larq jar, I created and bundled with my project.
> 
> Am I missing something or should I just wait for the LARQ-less arq jar? 
> 
> 
> Please do not hesitate to contact me if you need additional information. Thank you.
> 
> Regards,
> 
> Nikolaos Abatzis
> New River Systems Corporation
> 1890 Preston White Drive, Suite 240
> Reston, VA 20191
> ________________________________________
> From: Paolo Castagna [castagna.lists@googlemail.com]
> Sent: Wednesday, March 16, 2011 3:00 PM
> To: jena-users@incubator.apache.org
> Subject: Re: ARQ + Lucene
> 
> Nikolaos Abatzis wrote:
>> All:
>>
>> I just did a check on the 2.6.4 Jena bundle and I see that even the latest ARQ (2.8.7) seems to use the lucene-core-2.3.1.
>>
>> Are there any plans to port ARQ so that it uses a more recent version of Lucene. 2.3.1 is kind of ancient, current in the 2.x series is 2.9.4
>> and my understanding is that it uses Java 1.4. There is also a 3.x series that uses Java 1.5. per the lucene docs, the latest versions offer
>> significant enhancements and have fixed memory leaks, etc...
> 
> "Lucene 2.3.1 is kind of ancient", I agree. :-)
> 
> Yes, the plan is to extract LARQ as a separate module (which will depend on ARQ).
> 
> You can look here: https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
> As you can see, it is using Lucene 3.0.3 (and it is backward compatible with 2.9.x
> ... as just a drop-in replacement if somebody needs an older version of Lucene).
> 
> We are currently waiting the "all clear" to move source code from SourceForge to the
> Apache SVN repository. Once we do that, little changes are necessary in the ARQ
> source code to dynamically look for LARQ and wiring it in (if present in the class
> path).
> 
> The LARQ separate module has already a few improvements (i.e. avoiding duplicates
> with subsequents updates and removals).
> 
> Still pending:
> 
>   - a way to support RDF Datasets in addition to Jena Models and proper assemblers.
>   - a tool to make it easier to build Lucene indexes pointing to a TDB location.
>   - (at one point, I'd like to see LARQ in Fuseki ;-))
> 
> Later:
> 
>   - Solr? https://github.com/castagna/SARQ
>   - ElasticSearch? https://github.com/castagna/EARQ
> 
> To checkout LARQ and compile it type:
> svn co https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/ LARQ
> cd LARQ
> mvn package
> 
> Let me know if you have problems,
> Paolo
> 
>> Please do not hesitate to contact me if you need additional information. Thank you.
>>
>> Regards,
>>
>> Nikolaos Abatzis
>> New River Systems Corporation
>> 1890 Preston White Drive, Suite 240
>> Reston, VA 20191
> 

RE: ARQ + Lucene

Posted by Nikolaos Abatzis <Ni...@newriversystems.com>.
Paolo,

Thanks for all your help. 

We were preparing for a customer demo, hence my delayed response. I did get LARQ and build it successfuly. Unless I am doing something wrong the problem is that there is no "stripped down" arq and it seems that the classloader picks up the larq classes from the arq jar versus the larq jar, I created and bundled with my project.

Am I missing something or should I just wait for the LARQ-less arq jar? 


Please do not hesitate to contact me if you need additional information. Thank you.

Regards,

Nikolaos Abatzis
New River Systems Corporation
1890 Preston White Drive, Suite 240
Reston, VA 20191
________________________________________
From: Paolo Castagna [castagna.lists@googlemail.com]
Sent: Wednesday, March 16, 2011 3:00 PM
To: jena-users@incubator.apache.org
Subject: Re: ARQ + Lucene

Nikolaos Abatzis wrote:
> All:
>
> I just did a check on the 2.6.4 Jena bundle and I see that even the latest ARQ (2.8.7) seems to use the lucene-core-2.3.1.
>
> Are there any plans to port ARQ so that it uses a more recent version of Lucene. 2.3.1 is kind of ancient, current in the 2.x series is 2.9.4
> and my understanding is that it uses Java 1.4. There is also a 3.x series that uses Java 1.5. per the lucene docs, the latest versions offer
> significant enhancements and have fixed memory leaks, etc...

"Lucene 2.3.1 is kind of ancient", I agree. :-)

Yes, the plan is to extract LARQ as a separate module (which will depend on ARQ).

You can look here: https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
As you can see, it is using Lucene 3.0.3 (and it is backward compatible with 2.9.x
... as just a drop-in replacement if somebody needs an older version of Lucene).

We are currently waiting the "all clear" to move source code from SourceForge to the
Apache SVN repository. Once we do that, little changes are necessary in the ARQ
source code to dynamically look for LARQ and wiring it in (if present in the class
path).

The LARQ separate module has already a few improvements (i.e. avoiding duplicates
with subsequents updates and removals).

Still pending:

  - a way to support RDF Datasets in addition to Jena Models and proper assemblers.
  - a tool to make it easier to build Lucene indexes pointing to a TDB location.
  - (at one point, I'd like to see LARQ in Fuseki ;-))

Later:

  - Solr? https://github.com/castagna/SARQ
  - ElasticSearch? https://github.com/castagna/EARQ

To checkout LARQ and compile it type:
svn co https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/ LARQ
cd LARQ
mvn package

Let me know if you have problems,
Paolo

>
> Please do not hesitate to contact me if you need additional information. Thank you.
>
> Regards,
>
> Nikolaos Abatzis
> New River Systems Corporation
> 1890 Preston White Drive, Suite 240
> Reston, VA 20191


Re: ARQ + Lucene

Posted by Paolo Castagna <ca...@googlemail.com>.

Nikolaos Abatzis wrote:
> All:
> 
> I just did a check on the 2.6.4 Jena bundle and I see that even the latest ARQ (2.8.7) seems to use the lucene-core-2.3.1.
> 
> Are there any plans to port ARQ so that it uses a more recent version of Lucene. 2.3.1 is kind of ancient, current in the 2.x series is 2.9.4
> and my understanding is that it uses Java 1.4. There is also a 3.x series that uses Java 1.5. per the lucene docs, the latest versions offer 
> significant enhancements and have fixed memory leaks, etc...

"Lucene 2.3.1 is kind of ancient", I agree. :-)

Yes, the plan is to extract LARQ as a separate module (which will depend on ARQ).

You can look here: https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
As you can see, it is using Lucene 3.0.3 (and it is backward compatible with 2.9.x
... as just a drop-in replacement if somebody needs an older version of Lucene).

We are currently waiting the "all clear" to move source code from SourceForge to the
Apache SVN repository. Once we do that, little changes are necessary in the ARQ
source code to dynamically look for LARQ and wiring it in (if present in the class
path).

The LARQ separate module has already a few improvements (i.e. avoiding duplicates
with subsequents updates and removals).

Still pending:

  - a way to support RDF Datasets in addition to Jena Models and proper assemblers.
  - a tool to make it easier to build Lucene indexes pointing to a TDB location.
  - (at one point, I'd like to see LARQ in Fuseki ;-))

Later:

  - Solr? https://github.com/castagna/SARQ
  - ElasticSearch? https://github.com/castagna/EARQ

To checkout LARQ and compile it type:
svn co https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/ LARQ
cd LARQ
mvn package

Let me know if you have problems,
Paolo

> 
> Please do not hesitate to contact me if you need additional information. Thank you.
> 
> Regards,
> 
> Nikolaos Abatzis
> New River Systems Corporation
> 1890 Preston White Drive, Suite 240
> Reston, VA 20191