You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Claude Libois <cl...@student.fsa.ucl.ac.be> on 2003/07/28 10:32:21 UTC

Different Analyzer for each Field

My question is in the title: how can I use a different   Analyzer for 
each field of a Document object? My problem is that if I use 
LetterTokenizer for a field which contains a String representation of a 
number, after I can't delete it. Probably because this analyzer threw 
away my number. So I need to use whitespaceTokenizer for this field but 
I would like to use LetterTokenizer for the other. Can someone help me?
thank you
Claude Libois


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Different Analyzer for each Field

Posted by Erik Hatcher <li...@ehatchersolutions.com>.

On Monday, July 28, 2003, at 03:12  AM, Kelvin Tan wrote:
> AFAIK, there is a one-one mapping between an index and an analyzer.

Not true.  The Analyzer base class has a method tokenStream that 
accepts the field name.  None of the built-in analyzers use the field 
name to do anything different based on the field name, but a custom 
analyzer easily could.

This change (I think) was made relatively recently, so maybe its not 
part of a release build of Lucene?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Different Analyzer for each Field

Posted by Kelvin Tan <li...@relevanz.com>.

Perhaps one way to do it is to have 2 separate indices for the 2 analyzers. 
Then, depending on which field you wish to search, you can choose from either 
index. 

AFAIK, there is a one-one mapping between an index and an analyzer.

Kelvin

On Mon, 28 Jul 2003 10:32:21 +0200, Claude Libois said:
>My question is in the title: how can I use a different   Analyzer
>for
>each field of a Document object? My problem is that if I use
>LetterTokenizer for a field which contains a String representation
>of a
>number, after I can't delete it. Probably because this analyzer
>threw
>away my number. So I need to use whitespaceTokenizer for this field
>but
>I would like to use LetterTokenizer for the other. Can someone help
>me?
>thank you
>Claude Libois
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Different Analyzer for each Field

Posted by Claude Libois <cl...@student.fsa.ucl.ac.be>.

thank for your answers but i found another way to solve my problem. I 
don't tokenize my field anymore so it doesn't pass trough the analyzer 
and it works. Nevertheless, I will certainly use  in the future what 
you told me.
On Monday, July 28, 2003, at 02:56 PM, Erik Hatcher wrote:

> On Monday, July 28, 2003, at 01:32  AM, Claude Libois wrote:
>> My question is in the title: how can I use a different   Analyzer for 
>> each field of a Document object? My problem is that if I use 
>> LetterTokenizer for a field which contains a String representation of 
>> a number, after I can't delete it. Probably because this analyzer 
>> threw away my number. So I need to use whitespaceTokenizer for this 
>> field but I would like to use LetterTokenizer for the other. Can 
>> someone help me?
>> thank you
>
> My recommendation is to write a custom Analyzer subclass that uses the 
> field name on the tokenStream method to effect the internals of the 
> analysis process.  Just rip out the internals of the analyzers you 
> want to piece together into your own analyzer that has the logic you 
> want.
>
> 	Erik
>
> p.s. You may need to use a CVS version of Lucene for this feature?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Different Analyzer for each Field

Posted by Erik Hatcher <li...@ehatchersolutions.com>.

On Monday, July 28, 2003, at 01:32  AM, Claude Libois wrote:
> My question is in the title: how can I use a different   Analyzer for 
> each field of a Document object? My problem is that if I use 
> LetterTokenizer for a field which contains a String representation of 
> a number, after I can't delete it. Probably because this analyzer 
> threw away my number. So I need to use whitespaceTokenizer for this 
> field but I would like to use LetterTokenizer for the other. Can 
> someone help me?
> thank you

My recommendation is to write a custom Analyzer subclass that uses the 
field name on the tokenStream method to effect the internals of the 
analysis process.  Just rip out the internals of the analyzers you want 
to piece together into your own analyzer that has the logic you want.

	Erik

p.s. You may need to use a CVS version of Lucene for this feature?

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org