You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2016/06/14 12:02:34 UTC

Boosting Lucene fields with ARQ and comparing ARQ to SIREn

Hi,

we are making a document search system which consists of document 
database for storing text and Jena for storing all document metadata 
(DCMI terms). We need to find documents by boosting certain metadata 
fields over content, and also find similar documents with custom 
boosting of fields. Search is targeted to content and metadata. In 
search results we need to return all related metadata stored in Jena.

I have already made a separate Lucene index for content and some 
metadata fields and just noticed ARQ extension can do that (yes, should 
have read Jena documentation first). But is it possible to boost Lucene 
fields for search and similar when using ARQ?

Also found this SIREn: 
http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf

Does anyone have any experience on SIREn, how does it compare to ARQ?

Thanks,
Mikael

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND


Re: Boosting Lucene fields with ARQ and comparing ARQ to SIREn

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi Osma!

yes sorry I was talking about jena-text. It would really be great to be 
able to use one tool for all RDF and text index queries. I'll check how 
JIRA works and what is submitted there already.

Br,
Mikael


On 16.6.2016 13:59, Osma Suominen wrote:
> Hi Mikael!
>
> I assume you are talking about using (or not) jena-text here?
>
> If needed features such as result highlighting are missing from 
> jena-text, please consider creating one or more JIRA issues on 
> issues.apache.org so that they can be discussed and possibly addressed 
> in future versions. Also pull requests for jena-text are very welcome!
>
> The idea with jena-text is to have text index functionality built in 
> to the RDF store, so that there is no need for an application to 
> maintain an external Lucene (or similar) index. It obviously exposes 
> only a subset of Lucene (or Solr, elasticsearch and the like) 
> capabilities, but the subset has expanded over time according to 
> users' requirements.
>
> -Osma
>
> On 16/06/16 11:48, Mikael Pesonen wrote:
>>
>> Ok thanks! Looks like we need result highlighting too so seems to be
>> best so stick with separate Lucene at this time.
>>
>> So basically I'm duplicating all rdf data in Lucene index so not the
>> most elegant solution...
>>
>> Br,
>> Mikael
>>
>>
>> On 15.6.2016 23:56, Andy Seaborne wrote:
>>> On 14/06/16 13:02, Mikael Pesonen wrote:
>>>>
>>>> Hi,
>>>>
>>>> we are making a document search system which consists of document
>>>> database for storing text and Jena for storing all document metadata
>>>> (DCMI terms). We need to find documents by boosting certain metadata
>>>> fields over content, and also find similar documents with custom
>>>> boosting of fields. Search is targeted to content and metadata. In
>>>> search results we need to return all related metadata stored in Jena.
>>>>
>>>> I have already made a separate Lucene index for content and some
>>>> metadata fields and just noticed ARQ extension can do that (yes, 
>>>> should
>>>> have read Jena documentation first). But is it possible to boost 
>>>> Lucene
>>>> fields for search and similar when using ARQ?
>>>
>>> At query time:
>>>
>>> The query string can be any Lucene syntax so the "^" operator should
>>> work.
>>>
>>> At index build time through ARQ:
>>>   Sorry - don't know for sure ; it doesn't look like it.
>>>
>>>     Andy
>>>
>>>>
>>>> Also found this SIREn:
>>>> http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf 
>>>>
>>>>
>>>>
>>>> Does anyone have any experience on SIREn, how does it compare to ARQ?
>>>>
>>>> Thanks,
>>>> Mikael
>>>>
>>>
>>
>
>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND


Re: Boosting Lucene fields with ARQ and comparing ARQ to SIREn

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Mikael!

I assume you are talking about using (or not) jena-text here?

If needed features such as result highlighting are missing from 
jena-text, please consider creating one or more JIRA issues on 
issues.apache.org so that they can be discussed and possibly addressed 
in future versions. Also pull requests for jena-text are very welcome!

The idea with jena-text is to have text index functionality built in to 
the RDF store, so that there is no need for an application to maintain 
an external Lucene (or similar) index. It obviously exposes only a 
subset of Lucene (or Solr, elasticsearch and the like) capabilities, but 
the subset has expanded over time according to users' requirements.

-Osma

On 16/06/16 11:48, Mikael Pesonen wrote:
>
> Ok thanks! Looks like we need result highlighting too so seems to be
> best so stick with separate Lucene at this time.
>
> So basically I'm duplicating all rdf data in Lucene index so not the
> most elegant solution...
>
> Br,
> Mikael
>
>
> On 15.6.2016 23:56, Andy Seaborne wrote:
>> On 14/06/16 13:02, Mikael Pesonen wrote:
>>>
>>> Hi,
>>>
>>> we are making a document search system which consists of document
>>> database for storing text and Jena for storing all document metadata
>>> (DCMI terms). We need to find documents by boosting certain metadata
>>> fields over content, and also find similar documents with custom
>>> boosting of fields. Search is targeted to content and metadata. In
>>> search results we need to return all related metadata stored in Jena.
>>>
>>> I have already made a separate Lucene index for content and some
>>> metadata fields and just noticed ARQ extension can do that (yes, should
>>> have read Jena documentation first). But is it possible to boost Lucene
>>> fields for search and similar when using ARQ?
>>
>> At query time:
>>
>> The query string can be any Lucene syntax so the "^" operator should
>> work.
>>
>> At index build time through ARQ:
>>   Sorry - don't know for sure ; it doesn't look like it.
>>
>>     Andy
>>
>>>
>>> Also found this SIREn:
>>> http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf
>>>
>>>
>>> Does anyone have any experience on SIREn, how does it compare to ARQ?
>>>
>>> Thanks,
>>> Mikael
>>>
>>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Boosting Lucene fields with ARQ and comparing ARQ to SIREn

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Ok thanks! Looks like we need result highlighting too so seems to be 
best so stick with separate Lucene at this time.

So basically I'm duplicating all rdf data in Lucene index so not the 
most elegant solution...

Br,
Mikael


On 15.6.2016 23:56, Andy Seaborne wrote:
> On 14/06/16 13:02, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> we are making a document search system which consists of document
>> database for storing text and Jena for storing all document metadata
>> (DCMI terms). We need to find documents by boosting certain metadata
>> fields over content, and also find similar documents with custom
>> boosting of fields. Search is targeted to content and metadata. In
>> search results we need to return all related metadata stored in Jena.
>>
>> I have already made a separate Lucene index for content and some
>> metadata fields and just noticed ARQ extension can do that (yes, should
>> have read Jena documentation first). But is it possible to boost Lucene
>> fields for search and similar when using ARQ?
>
> At query time:
>
> The query string can be any Lucene syntax so the "^" operator should 
> work.
>
> At index build time through ARQ:
>   Sorry - don't know for sure ; it doesn't look like it.
>
>     Andy
>
>>
>> Also found this SIREn:
>> http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf 
>>
>>
>> Does anyone have any experience on SIREn, how does it compare to ARQ?
>>
>> Thanks,
>> Mikael
>>
>

-- 
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Etel�ranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND


Re: Boosting Lucene fields with ARQ and comparing ARQ to SIREn

Posted by Andy Seaborne <an...@apache.org>.
On 14/06/16 13:02, Mikael Pesonen wrote:
>
> Hi,
>
> we are making a document search system which consists of document
> database for storing text and Jena for storing all document metadata
> (DCMI terms). We need to find documents by boosting certain metadata
> fields over content, and also find similar documents with custom
> boosting of fields. Search is targeted to content and metadata. In
> search results we need to return all related metadata stored in Jena.
>
> I have already made a separate Lucene index for content and some
> metadata fields and just noticed ARQ extension can do that (yes, should
> have read Jena documentation first). But is it possible to boost Lucene
> fields for search and similar when using ARQ?

At query time:

The query string can be any Lucene syntax so the "^" operator should work.

At index build time through ARQ:
   Sorry - don't know for sure ; it doesn't look like it.

	Andy

>
> Also found this SIREn:
> http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf
>
> Does anyone have any experience on SIREn, how does it compare to ARQ?
>
> Thanks,
> Mikael
>