You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "mike.schultz" <mi...@gmail.com> on 2009/07/02 15:36:22 UTC

Making Analyzer Phrase aware?

I was looking at the SOLR-908 port of nutch CommonGramsFilter as an approach
for having phrase searches be sensitive to stop words within a query.  So a
search on "car on street" wouldn't match the text "car in street".

>From what I can tell the query version of the filter will *always* create
stop-word-grams, not just in a phrase context.  I want non-phrase searches
to ignore stop words as usual.  Can someone tell me how to make an analyzer
(or token filter) "phrase aware" so I only create grams when I know I'm
inside of a phrase?

Thanks.
Mike
-- 
View this message in context: http://www.nabble.com/Making-Analyzer-Phrase-aware--tp24306862p24306862.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Making Analyzer Phrase aware?

Posted by Chris Hostetter <ho...@fucit.org>.

: I was looking at the SOLR-908 port of nutch CommonGramsFilter as an approach
: for having phrase searches be sensitive to stop words within a query.  So a
: search on "car on street" wouldn't match the text "car in street".
: 
: >From what I can tell the query version of the filter will *always* create
: stop-word-grams, not just in a phrase context.  I want non-phrase searches
: to ignore stop words as usual.  Can someone tell me how to make an analyzer
: (or token filter) "phrase aware" so I only create grams when I know I'm
: inside of a phrase?

It depends on what you mean by "phrase"

When QueryParser sees quote characters, it passes everything in the quotes 
to the analyzer as a single stream -- it doesn't matter if it's "word" or 
"a phrase"

When an "Analyzer" gets a stream of text, it has no way of knowing wether 
that text was orriginally wrapped in quotes .. but if you're assuming your 
analyzer was called by the QueryParser, then you can make assumptions like  
"i see white space in this stream so it must be a phrase"

A "TokenFilter" gets a TokenStream, and can't really know where that 
TokenStream came from -- but it could make some assumptions too (like: 
this token has a position one higher then the previous one, so it must be 
a phrase)

but like i said: it all depends on what you mean by phrase.


-Hoss