You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/10/27 14:10:38 UTC

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <je...@gmail.com> wrote:
> I don't really get why these two tokens are subsequently put together
> in a phrase query.

That's the way the Lucene query parser has always worked... phrase
queries are made if multiple tokens are produced from one field query.

> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
> matches document where title contains "d'affaire" and all is fine.

This should not have changed between 1.3 and 1.4...
What's the fieldType and it's definition for your title field?

-Yonik
http://www.lucidimagination.com

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

Posted by Chris Hostetter <ho...@fucit.org>.

: However, even when it's set to 'false' , the highlighting of a field
: continues to work even if the search doesn't.
: Does the highlighter use a different strategy to match the query terms
: in the fields?

if it has term vectors, it uses them, otherwise it re analyzes the stored 
fields.


-Hoss

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

Posted by Jérôme Etévé <je...@gmail.com>.

Mea maxima culpa,

I had foolishly set the option  omitTermFreqAndPositions="false" in an
attempt to save space.
It works when this is set to 'true'.

However, even when it's set to 'false' , the highlighting of a field
continues to work even if the search doesn't.
Does the highlighter use a different strategy to match the query terms
in the fields?

Cheers!

Jerome.

2009/10/27 Jérôme Etévé <je...@gmail.com>:
> Actually here is the difference between the textgen analysis pipeline and our:
>
> For the phrase "ingenieur d'affaire senior" ,
> Our pipeline gives right after our tokenizer:
>
> term position   1       2       3       4
> term text       ingenieur       d       affaire senior
>
> 'd' and 'affaire' are separated as different tokens straight away. Our
> filters have no later effect for this phrase.
>
> * The textgen pipeline uses a whitespace tokenizer, so it gives first:
> term position   1       2       3
> term text       ingenieur       d'affaire       senior
> term type       word    word    word
> source start,end        0,9     10,19   20,26
>
> * Then a word delimiter filter splits the token "d'affaire" (and
> generate the concatenation):
> erm position    1       2       3       4
> term text       ingenieur       d       affaire senior
> daffaire
> term type       word    word    word    word
> word
> source start,end        0,9     10,11   12,19   20,26
> 10,19
>
>
> Could you see a reason why title:"d affaire" works with textgen but
> not with our type?

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

Posted by Jérôme Etévé <je...@gmail.com>.

Actually here is the difference between the textgen analysis pipeline and our:

For the phrase "ingenieur d'affaire senior" ,
Our pipeline gives right after our tokenizer:

term position 	1	2	3	4
term text 	ingenieur	d	affaire	senior

'd' and 'affaire' are separated as different tokens straight away. Our
filters have no later effect for this phrase.

* The textgen pipeline uses a whitespace tokenizer, so it gives first:
term position 	1	2	3
term text 	ingenieur	d'affaire	senior
term type 	word	word	word
source start,end 	0,9	10,19	20,26

* Then a word delimiter filter splits the token "d'affaire" (and
generate the concatenation):
erm position 	1	2	3	4
term text 	ingenieur	d	affaire	senior
daffaire
term type 	word	word	word	word
word
source start,end 	0,9	10,11	12,19	20,26
10,19


Could you see a reason why title:"d affaire" works with textgen but
not with our type?

Thanks!

Jerome.


2009/10/27 Jérôme Etévé <je...@gmail.com>:
> Hum,
>  That's probably because of our own customized types/tokenizers/filters.
>
> I tried reindexing and querying our data using the default solr type
> 'textgen' and it works fine.
>
> I need to investigate which features of the new lucene 2.9 API is not
> implemented in our own tokenizers etc...
>
> Thanks.
>
> Jerome.
>
> 2009/10/27 Yonik Seeley <yo...@lucidimagination.com>:
>> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <je...@gmail.com> wrote:
>>> I don't really get why these two tokens are subsequently put together
>>> in a phrase query.
>>
>> That's the way the Lucene query parser has always worked... phrase
>> queries are made if multiple tokens are produced from one field query.
>>
>>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>>> matches document where title contains "d'affaire" and all is fine.
>>
>> This should not have changed between 1.3 and 1.4...
>> What's the fieldType and it's definition for your title field?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jerome@eteve.net
>



-- 
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

Posted by Jérôme Etévé <je...@gmail.com>.

Hum,
 That's probably because of our own customized types/tokenizers/filters.

I tried reindexing and querying our data using the default solr type
'textgen' and it works fine.

I need to investigate which features of the new lucene 2.9 API is not
implemented in our own tokenizers etc...

Thanks.

Jerome.

2009/10/27 Yonik Seeley <yo...@lucidimagination.com>:
> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <je...@gmail.com> wrote:
>> I don't really get why these two tokens are subsequently put together
>> in a phrase query.
>
> That's the way the Lucene query parser has always worked... phrase
> queries are made if multiple tokens are produced from one field query.
>
>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>> matches document where title contains "d'affaire" and all is fine.
>
> This should not have changed between 1.3 and 1.4...
> What's the fieldType and it's definition for your title field?
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net