You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Claude Libois <cl...@student.fsa.ucl.ac.be> on 2003/07/28 10:32:21 UTC
Different Analyzer for each Field
My question is in the title: how can I use a different Analyzer for
each field of a Document object? My problem is that if I use
LetterTokenizer for a field which contains a String representation of a
number, after I can't delete it. Probably because this analyzer threw
away my number. So I need to use whitespaceTokenizer for this field but
I would like to use LetterTokenizer for the other. Can someone help me?
thank you
Claude Libois
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Different Analyzer for each Field
Posted by Erik Hatcher <li...@ehatchersolutions.com>.
On Monday, July 28, 2003, at 03:12 AM, Kelvin Tan wrote:
> AFAIK, there is a one-one mapping between an index and an analyzer.
Not true. The Analyzer base class has a method tokenStream that
accepts the field name. None of the built-in analyzers use the field
name to do anything different based on the field name, but a custom
analyzer easily could.
This change (I think) was made relatively recently, so maybe its not
part of a release build of Lucene?
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Different Analyzer for each Field
Posted by Kelvin Tan <li...@relevanz.com>.
Perhaps one way to do it is to have 2 separate indices for the 2 analyzers.
Then, depending on which field you wish to search, you can choose from either
index.
AFAIK, there is a one-one mapping between an index and an analyzer.
Kelvin
On Mon, 28 Jul 2003 10:32:21 +0200, Claude Libois said:
>My question is in the title: how can I use a different Analyzer
>for
>each field of a Document object? My problem is that if I use
>LetterTokenizer for a field which contains a String representation
>of a
>number, after I can't delete it. Probably because this analyzer
>threw
>away my number. So I need to use whitespaceTokenizer for this field
>but
>I would like to use LetterTokenizer for the other. Can someone help
>me?
>thank you
>Claude Libois
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Different Analyzer for each Field
Posted by Claude Libois <cl...@student.fsa.ucl.ac.be>.
thank for your answers but i found another way to solve my problem. I
don't tokenize my field anymore so it doesn't pass trough the analyzer
and it works. Nevertheless, I will certainly use in the future what
you told me.
On Monday, July 28, 2003, at 02:56 PM, Erik Hatcher wrote:
> On Monday, July 28, 2003, at 01:32 AM, Claude Libois wrote:
>> My question is in the title: how can I use a different Analyzer for
>> each field of a Document object? My problem is that if I use
>> LetterTokenizer for a field which contains a String representation of
>> a number, after I can't delete it. Probably because this analyzer
>> threw away my number. So I need to use whitespaceTokenizer for this
>> field but I would like to use LetterTokenizer for the other. Can
>> someone help me?
>> thank you
>
> My recommendation is to write a custom Analyzer subclass that uses the
> field name on the tokenStream method to effect the internals of the
> analysis process. Just rip out the internals of the analyzers you
> want to piece together into your own analyzer that has the logic you
> want.
>
> Erik
>
> p.s. You may need to use a CVS version of Lucene for this feature?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Different Analyzer for each Field
Posted by Erik Hatcher <li...@ehatchersolutions.com>.
On Monday, July 28, 2003, at 01:32 AM, Claude Libois wrote:
> My question is in the title: how can I use a different Analyzer for
> each field of a Document object? My problem is that if I use
> LetterTokenizer for a field which contains a String representation of
> a number, after I can't delete it. Probably because this analyzer
> threw away my number. So I need to use whitespaceTokenizer for this
> field but I would like to use LetterTokenizer for the other. Can
> someone help me?
> thank you
My recommendation is to write a custom Analyzer subclass that uses the
field name on the tokenStream method to effect the internals of the
analysis process. Just rip out the internals of the analyzers you want
to piece together into your own analyzer that has the logic you want.
Erik
p.s. You may need to use a CVS version of Lucene for this feature?
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org