You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benoit Mercier <be...@member.fsf.org> on 2011/01/13 04:38:11 UTC
"or" as a search term
Hi,
I am happily using Lucene for several years to offer French lexical
analysis tools to university researchers. Today, one of them decided
to analyze the use of the French word "or" (meaning "gold" in French) in
one of my corpus powered by Lucene... And, as you probably already
guessed, no results...
I tried not using the default QueryParser implementation and building
programmatically a simple BooleanQuery with the "or" term (surrounded or
not by double quotes) : no results. I also played a lot with Luke to be
sure that my code is not responsible for this behavior. By the way, my
corpus contains a lot of "or" occurrences and everything else is working
perfectly well for many years.
I first thought that modifying the QueryParser JavaCC lexical grammar
would help (desactivating the OR operator and just keep || ), but the
problem seems wider since even without using the QueryParser I am unable
to find the word "or" in my indexes...
Do you have any clue?
Thank you very much in advance!
Best regards,
Benoit (mercibe)
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: "or" as a search term
Posted by Benoit Mercier <be...@member.fsf.org>.
Thank you for your reply.
I am using my own FrenchAnalyzer for lexical analysis. It extends
org.apache.lucene.analysis.Analyzer and my stopwords set is empty.
Benoit
On 2011-01-12 23:05, Robert Muir wrote:
> On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
> <be...@member.fsf.org> wrote:
>> Hi,
>>
>> I am happily using Lucene for several years to offer French lexical analysis
>> tools to university researchers. Today, one of them decided to analyze the
>> use of the French word "or" (meaning "gold" in French) in one of my corpus
>> powered by Lucene... And, as you probably already guessed, no results...
>>
> What analyzer are you using?
>
> By default, StandardAnalyzer and StopAnalyzer uses a set of english
> stopwords. For french, this list is probably not appropriate.
> If you look at the javadocs, you can pass in your own set of
> stopwords... for lexical analysis maybe this should be an empty set.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: "or" as a search term
Posted by Robert Muir <rc...@gmail.com>.
On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
<be...@member.fsf.org> wrote:
> Hi,
>
> I am happily using Lucene for several years to offer French lexical analysis
> tools to university researchers. Today, one of them decided to analyze the
> use of the French word "or" (meaning "gold" in French) in one of my corpus
> powered by Lucene... And, as you probably already guessed, no results...
>
What analyzer are you using?
By default, StandardAnalyzer and StopAnalyzer uses a set of english
stopwords. For french, this list is probably not appropriate.
If you look at the javadocs, you can pass in your own set of
stopwords... for lexical analysis maybe this should be an empty set.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org