You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by FatMan Corp <fa...@gmail.com> on 2011/05/03 18:36:55 UTC

Getting field information inside a Tokenizer

Hi, I would like to get another's field information for the same document
within a Tekonizer class.
How can this be achieved?

Thanks

RE: Getting field information inside a Tokenizer

Posted by Steven A Rowe <sa...@syr.edu>.
Hi FMC,

On 5/3/2011 at 12:37 PM, FatMan Corp wrote:
> Hi, I would like to get another's field information for the same document
> within a Tekonizer class.
> How can this be achieved?

Use <copyField>s in your schema <http://wiki.apache.org/solr/SchemaXml#Copy_Fields>, and associate different analysis pipelines with each field.  Each field's analysis pipeline will be fed the original raw text.

Presently Lucene's analysis pipeline is single-field only: you have to create separate analysis pipelines for each field, with an extra pass over the original text for each field. I personally think Lucene should provide multi-field analysis capabilities, but this would not be a simple change.  Even if Lucene does eventually gain this capability, modifying Solr to expose it would be an added layer of complexity, and given that <copyField> already exists as a workaround, there may be little motivation to do so.

Some of the use cases full multi-field analysis could serve are already handled in Lucene (but not yet in Solr) by TeeSinkTokenFilter <http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/analysis/TeeSinkTokenFilter.html>.  An enterprising Lucene user could write a single-pass tokenizer that emits tokens with one type per target field, then employ one TeeSinkTokenFilter per field to approximate full multi-field analysis.  Adding TeeSinkTokenFilter support to Solr, though, would require substantial changes to Solr's code and schema format (schema schema?).

Steve

> -----Original Message-----
> From: FatMan Corp [mailto:fatmancorp@gmail.com]
> Sent: Tuesday, May 03, 2011 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: Getting field information inside a Tokenizer
> 
> Hi, I would like to get another's field information for the same document
> within a Tekonizer class.
> How can this be achieved?
> 
> Thanks