You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joachim De Beule <jo...@arti.vub.ac.be> on 2010/03/17 21:25:31 UTC

exact query match?

Hi All,

I have a corpus of documents which I want to search for phrases. I only want 
to get those documents that exactly contain a phrase. for example if:
doc1 = "x 11 windowing system"
doc2 = "x windowing system"
doc3 = "the x 11 windowing system"

then I want the query "x 11 windowing system" to return only doc1 and doc3 and 
the query "the x 11" to return only doc3.

I have tried to use SimpleAnalyzer together with using the query as a single 
phrase, but this still also gives doc2 for the first example query because this 
analyzer discards the number 11. There does not seem to be an alternative 
analyzer for this however, and I don't know how to write one myself.

Is there a standard way of doing this?

Thanks!

Joachim.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: exact query match?

Posted by Erick Erickson <er...@gmail.com>.
You might get some joy from WhitespaceAnalyzer, but beware of case and
punctuation. You could pre-process your indexing and querying to remove
non-alphanumerics.

Or you could create your own analyzer, see SynonymAnalyzer in Lucene In
Action, and there's another example here: http://mext.at/?p=26.

The idea is to string together some number of Filters, starting with a
Tokenizer that "does the right thing",  and create your own Analyzer.

But as far as I know, there's nothing out of the box that does what you
want.

Best
Erick

On Wed, Mar 17, 2010 at 4:25 PM, Joachim De Beule <jo...@arti.vub.ac.be>wrote:

> Hi All,
>
> I have a corpus of documents which I want to search for phrases. I only
> want
> to get those documents that exactly contain a phrase. for example if:
> doc1 = "x 11 windowing system"
> doc2 = "x windowing system"
> doc3 = "the x 11 windowing system"
>
> then I want the query "x 11 windowing system" to return only doc1 and doc3
> and
> the query "the x 11" to return only doc3.
>
> I have tried to use SimpleAnalyzer together with using the query as a
> single
> phrase, but this still also gives doc2 for the first example query because
> this
> analyzer discards the number 11. There does not seem to be an alternative
> analyzer for this however, and I don't know how to write one myself.
>
> Is there a standard way of doing this?
>
> Thanks!
>
> Joachim.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>