You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Benoit Mercier <be...@member.fsf.org> on 2011/01/13 04:38:11 UTC

"or" as a search term

Hi,

I am happily using Lucene for several years to offer French lexical 
analysis tools to university researchers.   Today, one of them decided 
to analyze the use of the French word "or" (meaning "gold" in French) in 
one of my corpus powered by Lucene...  And, as you probably already 
guessed, no results...

I tried not using the default QueryParser implementation and building 
programmatically a simple BooleanQuery with the "or" term (surrounded or 
not by double quotes) : no results.  I also played a lot with Luke to be 
sure that my code is not responsible for this behavior.  By the way, my 
corpus contains a lot of "or" occurrences and everything else is working 
perfectly well for many years.

I first thought that modifying the QueryParser JavaCC lexical grammar 
would help (desactivating the OR operator and just keep || ), but the 
problem seems wider since even without using the QueryParser I am unable 
to find the word "or" in my indexes...

Do you have any clue?

Thank you very much in advance!

Best regards,

Benoit (mercibe)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: "or" as a search term

Posted by Benoit Mercier <be...@member.fsf.org>.
Thank you for your reply.

I am using my own FrenchAnalyzer for lexical analysis.  It extends 
org.apache.lucene.analysis.Analyzer and my stopwords set is empty.

Benoit

On 2011-01-12 23:05, Robert Muir wrote:
> On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
> <be...@member.fsf.org>  wrote:
>> Hi,
>>
>> I am happily using Lucene for several years to offer French lexical analysis
>> tools to university researchers.   Today, one of them decided to analyze the
>> use of the French word "or" (meaning "gold" in French) in one of my corpus
>> powered by Lucene...  And, as you probably already guessed, no results...
>>
> What analyzer are you using?
>
> By default, StandardAnalyzer and StopAnalyzer uses a set of english
> stopwords. For french, this list is probably not appropriate.
> If you look at the javadocs, you can pass in your own set of
> stopwords... for lexical analysis maybe this should be an empty set.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: "or" as a search term

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
<be...@member.fsf.org> wrote:
> Hi,
>
> I am happily using Lucene for several years to offer French lexical analysis
> tools to university researchers.   Today, one of them decided to analyze the
> use of the French word "or" (meaning "gold" in French) in one of my corpus
> powered by Lucene...  And, as you probably already guessed, no results...
>

What analyzer are you using?

By default, StandardAnalyzer and StopAnalyzer uses a set of english
stopwords. For french, this list is probably not appropriate.
If you look at the javadocs, you can pass in your own set of
stopwords... for lexical analysis maybe this should be an empty set.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org