You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/10/27 14:10:38 UTC
Re: Multifield query parser and phrase query behaviour from 1.3 to
1.4
On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <je...@gmail.com> wrote:
> I don't really get why these two tokens are subsequently put together
> in a phrase query.
That's the way the Lucene query parser has always worked... phrase
queries are made if multiple tokens are produced from one field query.
> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
> matches document where title contains "d'affaire" and all is fine.
This should not have changed between 1.3 and 1.4...
What's the fieldType and it's definition for your title field?
-Yonik
http://www.lucidimagination.com
Re: Multifield query parser and phrase query behaviour from 1.3 to
1.4
Posted by Chris Hostetter <ho...@fucit.org>.
: However, even when it's set to 'false' , the highlighting of a field
: continues to work even if the search doesn't.
: Does the highlighter use a different strategy to match the query terms
: in the fields?
if it has term vectors, it uses them, otherwise it re analyzes the stored
fields.
-Hoss
Re: Multifield query parser and phrase query behaviour from 1.3 to
1.4
Posted by Jérôme Etévé <je...@gmail.com>.
Mea maxima culpa,
I had foolishly set the option omitTermFreqAndPositions="false" in an
attempt to save space.
It works when this is set to 'true'.
However, even when it's set to 'false' , the highlighting of a field
continues to work even if the search doesn't.
Does the highlighter use a different strategy to match the query terms
in the fields?
Cheers!
Jerome.
2009/10/27 Jérôme Etévé <je...@gmail.com>:
> Actually here is the difference between the textgen analysis pipeline and our:
>
> For the phrase "ingenieur d'affaire senior" ,
> Our pipeline gives right after our tokenizer:
>
> term position 1 2 3 4
> term text ingenieur d affaire senior
>
> 'd' and 'affaire' are separated as different tokens straight away. Our
> filters have no later effect for this phrase.
>
> * The textgen pipeline uses a whitespace tokenizer, so it gives first:
> term position 1 2 3
> term text ingenieur d'affaire senior
> term type word word word
> source start,end 0,9 10,19 20,26
>
> * Then a word delimiter filter splits the token "d'affaire" (and
> generate the concatenation):
> erm position 1 2 3 4
> term text ingenieur d affaire senior
> daffaire
> term type word word word word
> word
> source start,end 0,9 10,11 12,19 20,26
> 10,19
>
>
> Could you see a reason why title:"d affaire" works with textgen but
> not with our type?
Re: Multifield query parser and phrase query behaviour from 1.3 to
1.4
Posted by Jérôme Etévé <je...@gmail.com>.
Actually here is the difference between the textgen analysis pipeline and our:
For the phrase "ingenieur d'affaire senior" ,
Our pipeline gives right after our tokenizer:
term position 1 2 3 4
term text ingenieur d affaire senior
'd' and 'affaire' are separated as different tokens straight away. Our
filters have no later effect for this phrase.
* The textgen pipeline uses a whitespace tokenizer, so it gives first:
term position 1 2 3
term text ingenieur d'affaire senior
term type word word word
source start,end 0,9 10,19 20,26
* Then a word delimiter filter splits the token "d'affaire" (and
generate the concatenation):
erm position 1 2 3 4
term text ingenieur d affaire senior
daffaire
term type word word word word
word
source start,end 0,9 10,11 12,19 20,26
10,19
Could you see a reason why title:"d affaire" works with textgen but
not with our type?
Thanks!
Jerome.
2009/10/27 Jérôme Etévé <je...@gmail.com>:
> Hum,
> That's probably because of our own customized types/tokenizers/filters.
>
> I tried reindexing and querying our data using the default solr type
> 'textgen' and it works fine.
>
> I need to investigate which features of the new lucene 2.9 API is not
> implemented in our own tokenizers etc...
>
> Thanks.
>
> Jerome.
>
> 2009/10/27 Yonik Seeley <yo...@lucidimagination.com>:
>> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <je...@gmail.com> wrote:
>>> I don't really get why these two tokens are subsequently put together
>>> in a phrase query.
>>
>> That's the way the Lucene query parser has always worked... phrase
>> queries are made if multiple tokens are produced from one field query.
>>
>>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>>> matches document where title contains "d'affaire" and all is fine.
>>
>> This should not have changed between 1.3 and 1.4...
>> What's the fieldType and it's definition for your title field?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jerome@eteve.net
>
--
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net
Re: Multifield query parser and phrase query behaviour from 1.3 to
1.4
Posted by Jérôme Etévé <je...@gmail.com>.
Hum,
That's probably because of our own customized types/tokenizers/filters.
I tried reindexing and querying our data using the default solr type
'textgen' and it works fine.
I need to investigate which features of the new lucene 2.9 API is not
implemented in our own tokenizers etc...
Thanks.
Jerome.
2009/10/27 Yonik Seeley <yo...@lucidimagination.com>:
> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <je...@gmail.com> wrote:
>> I don't really get why these two tokens are subsequently put together
>> in a phrase query.
>
> That's the way the Lucene query parser has always worked... phrase
> queries are made if multiple tokens are produced from one field query.
>
>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>> matches document where title contains "d'affaire" and all is fine.
>
> This should not have changed between 1.3 and 1.4...
> What's the fieldType and it's definition for your title field?
>
> -Yonik
> http://www.lucidimagination.com
>
--
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net