You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by palexv <pa...@gmail.com> on 2008/04/14 16:36:55 UTC

Search for phrases

Hi all.
I have an index with a set of phrases(one or several words).
I need to make search for these phrases.

I am confused as I can not find a good way to search for phrases. 
For example I need to search for "java de*" and recieve "java developers",
"java development", "developed by java" etc.

Can I do this using lucene? What Query/Analyzer should I use? Should I
tokenize field in index?

-- 
View this message in context: http://www.nabble.com/Search-for-phrases-tp16678104p16678104.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search for phrases

Posted by Daniel Naber <lu...@danielnaber.de>.
On Dienstag, 15. April 2008, palexv wrote:

> I have not tokenized phrases in index.
> What query should I use?
> Simple TermQuery does not work.

Probably PhraseQuery with an argument like "java dev" (no asterisk).

> If I try to use QueryParser , what analyzer should I use?

Probably KeywordAnalyzer. Please see the instructions in the FAQ that 
describe how to debug search problems if it doesn't work.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search for phrases

Posted by Erick Erickson <er...@gmail.com>.
It would help a lot if you provided a couple of examples of inputs into your
index and expected outputs for queries.

For instance, you say:
<<<For example I need to search for "java de*" and recieve "java
developers",
"java development", "developed by java" et>>>

But then in your follow-up you say
<<<I have not tokenized phrases in index.>>>

Well, if you haven't tokenized your input streams at index time and
query time, you can't get what your first statement asks for.

StandardAnalyzer, for instance tokenizes the input stream. If you use
that at index time, you do, indeed, tokenize the phrases. That is, the
analyzer does it for you.

So, in order for us to help you, please provide:
1> the analyzers you use at index time (code snippets)
2> the analyzers you use at query time (code snippets)
3> example queries with  expected responses
4> example input that corresponds to <3>.

Or, a small, self-contained program (or unit test) that illustrates
what you are trying to accomplish.

Best
Erick

On Tue, Apr 15, 2008 at 2:25 AM, palexv <pa...@gmail.com> wrote:

>
> I have not tokenized phrases in index.
> What query should I use?
> Simple TermQuery does not work.
> If I try to use QueryParser , what analyzer should I use?
>
>
>
> Daniel Naber-10 wrote:
> >
> > On Montag, 14. April 2008, palexv wrote:
> >
> >> For example I need to search for "java de*" and recieve "java
> >> developers", "java development", "developed by java" etc.
> >
> > If your text is tokenized, this is not supported by QueryParser but you
> > can
> > create such queries using MultiPhraseQuery. If you don't tokenize a text
> > like "java development" at whitespace characters, it's just like a
> single
> > term for Lucene. A query like "java de*" should work then, just make
> sure
> > you don't tokenize the search query either.
> >
> > Regards
> >  Daniel
> >
> > --
> > http://www.danielnaber.de
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Search-for-phrases-tp16678104p16695864.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Search for phrases

Posted by palexv <pa...@gmail.com>.
I have not tokenized phrases in index.
What query should I use?
Simple TermQuery does not work.
If I try to use QueryParser , what analyzer should I use?



Daniel Naber-10 wrote:
> 
> On Montag, 14. April 2008, palexv wrote:
> 
>> For example I need to search for "java de*" and recieve "java
>> developers", "java development", "developed by java" etc.
> 
> If your text is tokenized, this is not supported by QueryParser but you
> can 
> create such queries using MultiPhraseQuery. If you don't tokenize a text 
> like "java development" at whitespace characters, it's just like a single 
> term for Lucene. A query like "java de*" should work then, just make sure 
> you don't tokenize the search query either.
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Search-for-phrases-tp16678104p16695864.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search for phrases

Posted by Daniel Naber <lu...@danielnaber.de>.
On Montag, 14. April 2008, palexv wrote:

> For example I need to search for "java de*" and recieve "java
> developers", "java development", "developed by java" etc.

If your text is tokenized, this is not supported by QueryParser but you can 
create such queries using MultiPhraseQuery. If you don't tokenize a text 
like "java development" at whitespace characters, it's just like a single 
term for Lucene. A query like "java de*" should work then, just make sure 
you don't tokenize the search query either.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org