You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Stephane Fellah <st...@imagemattersllc.com> on 2014/06/04 19:34:54 UTC

How to store custom token attribute in Lucene Index ?

Hi,

I want to create a Lucene analyzer for RDF nodes. RDF nodes can have
multiple types (uri, bnode, plain literal, plain literal with language,
typed literal with datatype). While analyzing the term, I want to create a
RDFNodeTypeAttribute, LanguageAttribute and DatatypeAttribute to store
respectively the type of RDF node, the language of the literal and the
datatype attribute. My question is how these attributes can be stored in
Lucene index. Do I have to write a custom Codecs ? Do I have to use the
PayloadAttribute ? How can I leverage these attributes once stored in the
index for my search ?

Thank you for your help

-- 
Stephane Fellah
Chief  Knowledge Scientist
Image Matters LLC
+(571) 502 8478

RE: How to store custom token attribute in Lucene Index ?

Posted by Uwe Schindler <uw...@thetaphi.de>.
You can only use the PayloadAttribute at the moment. In general the way to go is to add another TokenFilter at the end of your indexing chain, that converts all those attributes to a single Payload (serializing them). On the search side, there are multiple possibilities to access the payloads (all position relation queries like span queries can use them). But in most cases you have to write a custom query.

Please note: Payloads are saved per position, so it means a payload is saved for every term and position (if the same term happens to be 5 times in a document, 5 payloads are saved in index, one for each position).

It is currently not possible to attach payloads to terms only (if one term has always the same payload). If you want to do that in your index, you can also add another TokenFilter at the end, that appends your attribute to the term (like "term#customAttribute"). While quriying, the query analyzer will do the same and will find the same term/attribute combination. In that case, default queries work (they just have to use the analyzer to produce the correct term to query for).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Stephane Fellah [mailto:stephanef@imagemattersllc.com]
> Sent: Wednesday, June 04, 2014 7:35 PM
> To: java-user@lucene.apache.org
> Subject: How to store custom token attribute in Lucene Index ?
> 
> Hi,
> 
> I want to create a Lucene analyzer for RDF nodes. RDF nodes can have
> multiple types (uri, bnode, plain literal, plain literal with language, typed
> literal with datatype). While analyzing the term, I want to create a
> RDFNodeTypeAttribute, LanguageAttribute and DatatypeAttribute to store
> respectively the type of RDF node, the language of the literal and the
> datatype attribute. My question is how these attributes can be stored in
> Lucene index. Do I have to write a custom Codecs ? Do I have to use the
> PayloadAttribute ? How can I leverage these attributes once stored in the
> index for my search ?
> 
> Thank you for your help
> 
> --
> Stephane Fellah
> Chief  Knowledge Scientist
> Image Matters LLC
> +(571) 502 8478


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org