You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by AHMET ARSLAN <io...@yahoo.com> on 2009/12/31 19:13:09 UTC

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

> Hello *, im trying to make an index
> to support spelling errors/fuzzy
> matching, ive indexed my document titles with
> NGramFilterFactory
> minGramSize=2 maxGramSize=3, using the analysis page i can
> see the
> common grams match between the indexed value and the query
> value,
> however when i try to do a query for it ex.
> title_ngram:(family)  the
> debug output says the query is converted to a phrase query
> "f a m i l
> y fa am mi il ly fam ami mil ily", if this is the expected
> behavior is
> there a way to override it?

"If a single token is split into more tokens during the analysis phase, solr will do a phrase query instead of a term query." [1]

[1]http://www.mail-archive.com/solr-user@lucene.apache.org/msg30055.html

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Robert Muir <rc...@gmail.com>.

the way that queryparser treats whitespace is also a problem for
languages that have words that contain spaces, like vietnamese.
i think it also causes grief for multi-word synonyms, such that they
don't work correctly at querytime:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

2010/1/18 Wangsheng Mei <ha...@gmail.com>:
> I faced a similar problem when I was dealing with Chinese words search.
> By simply adding a PositionFilter at the end of analyzer, the damn phrase
> query disappeared  and replaced by term queries which is what I've expected.
> That's very nice, thank you very much!
>
> Note that Chinese words segmentation is very different from English words
> segmentation in that the latter use a whitespace as the delimiter.
> So if I search "中国汉字", solr(lucene) will treat is as a phrase search because
> it doesn't see any whitespace within the query string.But in fact, it should
> be considered as BooleanQuery(OR) with two term queries search in this case.
> Anyway, I am confused by solr(lucene)'s behavior on this. Is it a bug?
>
> 2010/1/1 AHMET ARSLAN <io...@yahoo.com>
>
>> > "if this is the expected behaviour is
>> > there a way to override it?"[1]
>> >
>> > [1] me
>>
>>
>> Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed
>> query:
>>
>> field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil
>> field:ily
>>
>> [1]
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>>
>>
>>
>>
>
>
> --
> 梅旺生
>



-- 
Robert Muir
rcmuir@gmail.com

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Wangsheng Mei <ha...@gmail.com>.

I faced a similar problem when I was dealing with Chinese words search.
By simply adding a PositionFilter at the end of analyzer, the damn phrase
query disappeared  and replaced by term queries which is what I've expected.
That's very nice, thank you very much!

Note that Chinese words segmentation is very different from English words
segmentation in that the latter use a whitespace as the delimiter.
So if I search "中国汉字", solr(lucene) will treat is as a phrase search because
it doesn't see any whitespace within the query string.But in fact, it should
be considered as BooleanQuery(OR) with two term queries search in this case.
Anyway, I am confused by solr(lucene)'s behavior on this. Is it a bug?

2010/1/1 AHMET ARSLAN <io...@yahoo.com>

> > "if this is the expected behaviour is
> > there a way to override it?"[1]
> >
> > [1] me
>
>
> Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed
> query:
>
> field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil
> field:ily
>
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>
>
>
>


-- 
梅旺生

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by AHMET ARSLAN <io...@yahoo.com>.

> "if this is the expected behaviour is
> there a way to override it?"[1]
> 
> [1] me


Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed query:

field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil field:ily 

[1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Joe Calderon <ca...@gmail.com>.

"if this is the expected behaviour is there a way to override it?"[1]

[1] me

On Thu, Dec 31, 2009 at 10:13 AM, AHMET ARSLAN <io...@yahoo.com> wrote:
>> Hello *, im trying to make an index
>> to support spelling errors/fuzzy
>> matching, ive indexed my document titles with
>> NGramFilterFactory
>> minGramSize=2 maxGramSize=3, using the analysis page i can
>> see the
>> common grams match between the indexed value and the query
>> value,
>> however when i try to do a query for it ex.
>> title_ngram:(family)  the
>> debug output says the query is converted to a phrase query
>> "f a m i l
>> y fa am mi il ly fam ami mil ily", if this is the expected
>> behavior is
>> there a way to override it?
>
> "If a single token is split into more tokens during the analysis phase, solr will do a phrase query instead of a term query." [1]
>
> [1]http://www.mail-archive.com/solr-user@lucene.apache.org/msg30055.html
>
>
>
>
>