You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sol myr <so...@yahoo.com> on 2011/02/09 09:43:41 UTC

[Lucene] custom Query, and Stop Words

Hi,

I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users: 
If a user types:  java program
I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER").

So naively I split the user's input into words, and build my query term-by-term:
   String[] words= userInput.split(" ");
   BooleanQuery query=new BooleanQuery();
   for(String word: words)
       query.add(new PrefixQuery(new Term("topic", word)), Occur.MUST);

But since my index is built with StandardAnalyzer, I'm having trouble with Stop Words.
If the user types: program of java
Then of course my query (+program* +of* +java+) returns zero results.

Is there an easy solution?
I'd like to keep using StandardAnalyzer (I don't want to index by stupid keywords such as "of"). I would just like stop-words to be removed from my query (as QueryParser does).
Is there some utility method for this? Direct access to the list of stop-words?

Thanks 



 
____________________________________________________________________________________
Bored stiff? Loosen up... 
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front

Re: [Lucene] custom Query, and Stop Words

Posted by sol myr <so...@yahoo.com>.
Thanks so much - I used STOP_WORDS_SET and it works fine (luckily, punctuation and case are not a problem in our case).
Thanks !

--- On Wed, 2/9/11, Ian Lea <ia...@gmail.com> wrote:

From: Ian Lea <ia...@gmail.com>
Subject: Re: [Lucene] custom Query, and Stop Words
To: java-user@lucene.apache.org
Date: Wednesday, February 9, 2011, 1:58 AM

Have you considered using stemming instead?  Sounds like that might
make most of your problems go away and achieve the same result.

I'm not aware of a utility method to remove stop words from a string
but there are ways of passing data through analyzers/tokenizers and
grabbing the output. StandardAnalyzer stores the stop words in
STOP_WORDS_SET and you might be able to use that directly, or pass in
your own set of stop works which you would obviously have access to.

If you stick with your split/word by word approach, watch out for
punctuation.and Mixed Case and other complications.


--
Ian.

On Wed, Feb 9, 2011 at 8:43 AM, sol myr <so...@yahoo.com> wrote:
> Hi,
>
> I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users:
> If a user types:  java program
> I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER").
>
> So naively I split the user's input into words, and build my query term-by-term:
>    String[] words= userInput.split(" ");
>    BooleanQuery query=new BooleanQuery();
>    for(String word: words)
>        query.add(new PrefixQuery(new Term("topic", word)), Occur.MUST);
>
> But since my index is built with StandardAnalyzer, I'm having trouble with Stop Words.
> If the user types: program of java
> Then of course my query (+program* +of* +java+) returns zero results.
>
> Is there an easy solution?
> I'd like to keep using StandardAnalyzer (I don't want to index by stupid keywords such as "of"). I would just like stop-words to be removed from my query (as QueryParser does).
> Is there some utility method for this? Direct access to the list of stop-words?
>
> Thanks
>
>
>
>
> ____________________________________________________________________________________
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.
> http://games.yahoo.com/games/front

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




      

Re: [Lucene] custom Query, and Stop Words

Posted by Ian Lea <ia...@gmail.com>.
Have you considered using stemming instead?  Sounds like that might
make most of your problems go away and achieve the same result.

I'm not aware of a utility method to remove stop words from a string
but there are ways of passing data through analyzers/tokenizers and
grabbing the output. StandardAnalyzer stores the stop words in
STOP_WORDS_SET and you might be able to use that directly, or pass in
your own set of stop works which you would obviously have access to.

If you stick with your split/word by word approach, watch out for
punctuation.and Mixed Case and other complications.


--
Ian.

On Wed, Feb 9, 2011 at 8:43 AM, sol myr <so...@yahoo.com> wrote:
> Hi,
>
> I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users:
> If a user types:  java program
> I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER").
>
> So naively I split the user's input into words, and build my query term-by-term:
>    String[] words= userInput.split(" ");
>    BooleanQuery query=new BooleanQuery();
>    for(String word: words)
>        query.add(new PrefixQuery(new Term("topic", word)), Occur.MUST);
>
> But since my index is built with StandardAnalyzer, I'm having trouble with Stop Words.
> If the user types: program of java
> Then of course my query (+program* +of* +java+) returns zero results.
>
> Is there an easy solution?
> I'd like to keep using StandardAnalyzer (I don't want to index by stupid keywords such as "of"). I would just like stop-words to be removed from my query (as QueryParser does).
> Is there some utility method for this? Direct access to the list of stop-words?
>
> Thanks
>
>
>
>
> ____________________________________________________________________________________
> Bored stiff? Loosen up...
> Download and play hundreds of games for free on Yahoo! Games.
> http://games.yahoo.com/games/front

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org