You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Wangsheng Mei <ha...@gmail.com> on 2010/01/18 18:15:39 UTC

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

I faced a similar problem when I was dealing with Chinese words search.
By simply adding a PositionFilter at the end of analyzer, the damn phrase
query disappeared  and replaced by term queries which is what I've expected.
That's very nice, thank you very much!

Note that Chinese words segmentation is very different from English words
segmentation in that the latter use a whitespace as the delimiter.
So if I search "中国汉字", solr(lucene) will treat is as a phrase search because
it doesn't see any whitespace within the query string.But in fact, it should
be considered as BooleanQuery(OR) with two term queries search in this case.
Anyway, I am confused by solr(lucene)'s behavior on this. Is it a bug?

2010/1/1 AHMET ARSLAN <io...@yahoo.com>

> > "if this is the expected behaviour is
> > there a way to override it?"[1]
> >
> > [1] me
>
>
> Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed
> query:
>
> field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil
> field:ily
>
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>
>
>
>


-- 
梅旺生

Re: analyzer type="query" with NGramTokenFilterFactory forces phrase query

Posted by Robert Muir <rc...@gmail.com>.

the way that queryparser treats whitespace is also a problem for
languages that have words that contain spaces, like vietnamese.
i think it also causes grief for multi-word synonyms, such that they
don't work correctly at querytime:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

2010/1/18 Wangsheng Mei <ha...@gmail.com>:
> I faced a similar problem when I was dealing with Chinese words search.
> By simply adding a PositionFilter at the end of analyzer, the damn phrase
> query disappeared  and replaced by term queries which is what I've expected.
> That's very nice, thank you very much!
>
> Note that Chinese words segmentation is very different from English words
> segmentation in that the latter use a whitespace as the delimiter.
> So if I search "中国汉字", solr(lucene) will treat is as a phrase search because
> it doesn't see any whitespace within the query string.But in fact, it should
> be considered as BooleanQuery(OR) with two term queries search in this case.
> Anyway, I am confused by solr(lucene)'s behavior on this. Is it a bug?
>
> 2010/1/1 AHMET ARSLAN <io...@yahoo.com>
>
>> > "if this is the expected behaviour is
>> > there a way to override it?"[1]
>> >
>> > [1] me
>>
>>
>> Using PositionFilterFactory[1] after NGramFilterFactory can yield parsed
>> query:
>>
>> field:fa field:am field:mi field:il field:ly field:fam field:ami field:mil
>> field:ily
>>
>> [1]
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>>
>>
>>
>>
>
>
> --
> 梅旺生
>



-- 
Robert Muir
rcmuir@gmail.com