You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Joe Calderon <ca...@gmail.com> on 2009/12/31 18:34:23 UTC

analyzer type="query" with NGramTokenFilterFactory forces phrase query

Hello *, im trying to make an index to support spelling errors/fuzzy
matching, ive indexed my document titles with NGramFilterFactory
minGramSize=2 maxGramSize=3, using the analysis page i can see the
common grams match between the indexed value and the query value,
however when i try to do a query for it ex. title_ngram:(family)  the
debug output says the query is converted to a phrase query "f a m i l
y fa am mi il ly fam ami mil ily", if this is the expected behavior is
there a way to override it?

or should i scrap this approach and use title:(family) and boost on
strdist("family", title, ngram, 3) ?

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Robert Muir <rc...@gmail.com>.

the way that queryparser treats whitespace is also a problem for
languages that have words that contain spaces, like vietnamese.
i think it also causes grief for multi-word synonyms, such that they
don't work correctly at querytime:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

2010/1/18 Wangsheng Mei <ha...@gmail.com>:
> I faced a similar problem when I was dealing with Chinese words search.
> By simply adding a PositionFilter at the end of analyzer, the damn phrase
> query disappeared  and replaced by term queries which is what I've expected.
> That's very nice, thank you very much!
>
> Note that Chinese words segmentation is very different from English words
> segmentation in that the latter use a whitespace as the delimiter.
> So if I search "中国汉字", solr(lucene) will treat is as a phrase search because
> it doesn't see any whitespace within the query string.But in fact, it should
> be considered as BooleanQuery(OR) with two term queries search in this case.
> Anyway, I am confused by solr(lucene)'s behavior on this. Is it a bug?
>
> 2010/1/1 AHMET ARSLAN <io...@yahoo.com>
>
>> > "if this is the expected behaviour is
>> > there a way to override it?"[1]
>> >
>> > [1] me
>>
>>
>> Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed
>> query:
>>
>> field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil
>> field:ily
>>
>> [1]
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>>
>>
>>
>>
>
>
> --
> 梅旺生
>



-- 
Robert Muir
rcmuir@gmail.com

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Wangsheng Mei <ha...@gmail.com>.

I faced a similar problem when I was dealing with Chinese words search.
By simply adding a PositionFilter at the end of analyzer, the damn phrase
query disappeared  and replaced by term queries which is what I've expected.
That's very nice, thank you very much!

Note that Chinese words segmentation is very different from English words
segmentation in that the latter use a whitespace as the delimiter.
So if I search "中国汉字", solr(lucene) will treat is as a phrase search because
it doesn't see any whitespace within the query string.But in fact, it should
be considered as BooleanQuery(OR) with two term queries search in this case.
Anyway, I am confused by solr(lucene)'s behavior on this. Is it a bug?

2010/1/1 AHMET ARSLAN <io...@yahoo.com>

> > "if this is the expected behaviour is
> > there a way to override it?"[1]
> >
> > [1] me
>
>
> Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed
> query:
>
> field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil
> field:ily
>
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>
>
>
>


-- 
梅旺生

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by AHMET ARSLAN <io...@yahoo.com>.

> "if this is the expected behaviour is
> there a way to override it?"[1]
> 
> [1] me


Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed query:

field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil field:ily 

[1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Joe Calderon <ca...@gmail.com>.

"if this is the expected behaviour is there a way to override it?"[1]

[1] me

On Thu, Dec 31, 2009 at 10:13 AM, AHMET ARSLAN <io...@yahoo.com> wrote:
>> Hello *, im trying to make an index
>> to support spelling errors/fuzzy
>> matching, ive indexed my document titles with
>> NGramFilterFactory
>> minGramSize=2 maxGramSize=3, using the analysis page i can
>> see the
>> common grams match between the indexed value and the query
>> value,
>> however when i try to do a query for it ex.
>> title_ngram:(family)  the
>> debug output says the query is converted to a phrase query
>> "f a m i l
>> y fa am mi il ly fam ami mil ily", if this is the expected
>> behavior is
>> there a way to override it?
>
> "If a single token is split into more tokens during the analysis phase, solr will do a phrase query instead of a term query." [1]
>
> [1]http://www.mail-archive.com/solr-user@lucene.apache.org/msg30055.html
>
>
>
>
>

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by AHMET ARSLAN <io...@yahoo.com>.

> Hello *, im trying to make an index
> to support spelling errors/fuzzy
> matching, ive indexed my document titles with
> NGramFilterFactory
> minGramSize=2 maxGramSize=3, using the analysis page i can
> see the
> common grams match between the indexed value and the query
> value,
> however when i try to do a query for it ex.
> title_ngram:(family)  the
> debug output says the query is converted to a phrase query
> "f a m i l
> y fa am mi il ly fam ami mil ily", if this is the expected
> behavior is
> there a way to override it?

"If a single token is split into more tokens during the analysis phase, solr will do a phrase query instead of a term query." [1]

[1]http://www.mail-archive.com/solr-user@lucene.apache.org/msg30055.html