You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Bjarke Buur Mortensen <mo...@eluence.com> on 2019/08/27 12:48:56 UTC

Query-time synonyms without indexing

We have a solr file of type "string".
It turns out that we need to do synonym expansion on query time in order to
account for some changes over time in the values stored in that field.

So we have tried introducing a custom fieldType that applies the synonym
filter at query time only (see bottom of mail), but that requires us to
change the field. But now, when we index new documents, Solr complains:
400 Bad Request
Error: 'Exception writing document id someid to the index; possible
analysis error: cannot change field "auth_country_code" from index
options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',

Since we are only making query time changes, I would really like to not
have to reindex our entire collection. Is that possible somehow?

Thanks,
Bjarke


  <fieldType name="country_codes" class="solr.TextField"
sortMissingLast="true" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>  <!-- no
splitting of input -->
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory"
synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
    </analyzer>
  </fieldType>

Re: Query-time synonyms without indexing

Posted by Erick Erickson <er...@gmail.com>.

Ah, thanks for letting us know. 

Erick

> On Aug 29, 2019, at 9:20 AM, Bjarke Buur Mortensen <mo...@eluence.com> wrote:
> 
> The <analyzer> section without type is the one getting picked up for the
> index-time chain, so that wasn't my problem.
> 
> It turns out that because of
> https://issues.apache.org/jira/browse/LUCENE-8134, I needed to add
> a omitTermFreqAndPositions="true" to the <fieldType> declaration.
> This has to do with defaults for a string field being different from a text
> field, and i Solr 8+ indexing fails because of above ticket.
> Adding omitTermFreqAndPositions="true" ensures that index field type and
> the schema field type agree on the settings, as I understand it.
> 
> Regards,
> Bjarke
> 
> 
> 
> Den ons. 28. aug. 2019 kl. 13.26 skrev Erick Erickson <
> erickerickson@gmail.com>:
> 
>> Not sure. You have an
>> <analyzer>
>> section and
>> <analyzer=“query”>
>> 
>> section. Frankly I’m not sure which one will be used for the index-time
>> chain.
>> 
>> Why don’t you just try it?
>> change
>> <analyzer>
>> to
>> <analyzer=“index”>
>> 
>> reload and go. It’d take you 5 minutes and you’d have your answer.
>> 
>> Best,
>> Erick
>> 
>> 
>>> On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen <
>> mortensen@eluence.com> wrote:
>>> 
>>> Yes, but isn't that what I am already doing in this case (look at the
>>> fieldType in the original mail)?
>>> Is there some other way to specify that field type and achieve what I
>> want?
>>> 
>>> Thanks,
>>> Bjarke
>>> 
>>> On Tue, Aug 27, 2019, 17:32 Erick Erickson <er...@gmail.com>
>> wrote:
>>> 
>>>> You can have separate index and query time analysis chains, there are
>> many
>>>> examples in the stock Solr schemas.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
>>>> mortensen@eluence.com> wrote:
>>>>> 
>>>>> We have a solr file of type "string".
>>>>> It turns out that we need to do synonym expansion on query time in
>> order
>>>> to
>>>>> account for some changes over time in the values stored in that field.
>>>>> 
>>>>> So we have tried introducing a custom fieldType that applies the
>> synonym
>>>>> filter at query time only (see bottom of mail), but that requires us to
>>>>> change the field. But now, when we index new documents, Solr complains:
>>>>> 400 Bad Request
>>>>> Error: 'Exception writing document id someid to the index; possible
>>>>> analysis error: cannot change field "auth_country_code" from index
>>>>> options=DOCS to inconsistent index
>> options=DOCS_AND_FREQS_AND_POSITIONS',
>>>>> 
>>>>> Since we are only making query time changes, I would really like to not
>>>>> have to reindex our entire collection. Is that possible somehow?
>>>>> 
>>>>> Thanks,
>>>>> Bjarke
>>>>> 
>>>>> 
>>>>> <fieldType name="country_codes" class="solr.TextField"
>>>>> sortMissingLast="true" positionIncrementGap="100">
>>>>>  <analyzer>
>>>>>      <tokenizer class="solr.KeywordTokenizerFactory"/>  <!-- no
>>>>> splitting of input -->
>>>>>  </analyzer>
>>>>>  <analyzer type="query">
>>>>>      <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>>      <filter class="solr.SynonymGraphFilterFactory"
>>>>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>>>>>  </analyzer>
>>>>> </fieldType>
>>>> 
>>>> 
>> 
>>

Re: Query-time synonyms without indexing

Posted by Bjarke Buur Mortensen <mo...@eluence.com>.

The <analyzer> section without type is the one getting picked up for the
index-time chain, so that wasn't my problem.

It turns out that because of
https://issues.apache.org/jira/browse/LUCENE-8134, I needed to add
a omitTermFreqAndPositions="true" to the <fieldType> declaration.
This has to do with defaults for a string field being different from a text
field, and i Solr 8+ indexing fails because of above ticket.
Adding omitTermFreqAndPositions="true" ensures that index field type and
the schema field type agree on the settings, as I understand it.

Regards,
Bjarke



Den ons. 28. aug. 2019 kl. 13.26 skrev Erick Erickson <
erickerickson@gmail.com>:

> Not sure. You have an
> <analyzer>
> section and
> <analyzer=“query”>
>
> section. Frankly I’m not sure which one will be used for the index-time
> chain.
>
> Why don’t you just try it?
> change
> <analyzer>
> to
> <analyzer=“index”>
>
> reload and go. It’d take you 5 minutes and you’d have your answer.
>
> Best,
> Erick
>
>
> > On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen <
> mortensen@eluence.com> wrote:
> >
> > Yes, but isn't that what I am already doing in this case (look at the
> > fieldType in the original mail)?
> > Is there some other way to specify that field type and achieve what I
> want?
> >
> > Thanks,
> > Bjarke
> >
> > On Tue, Aug 27, 2019, 17:32 Erick Erickson <er...@gmail.com>
> wrote:
> >
> >> You can have separate index and query time analysis chains, there are
> many
> >> examples in the stock Solr schemas.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
> >> mortensen@eluence.com> wrote:
> >>>
> >>> We have a solr file of type "string".
> >>> It turns out that we need to do synonym expansion on query time in
> order
> >> to
> >>> account for some changes over time in the values stored in that field.
> >>>
> >>> So we have tried introducing a custom fieldType that applies the
> synonym
> >>> filter at query time only (see bottom of mail), but that requires us to
> >>> change the field. But now, when we index new documents, Solr complains:
> >>> 400 Bad Request
> >>> Error: 'Exception writing document id someid to the index; possible
> >>> analysis error: cannot change field "auth_country_code" from index
> >>> options=DOCS to inconsistent index
> options=DOCS_AND_FREQS_AND_POSITIONS',
> >>>
> >>> Since we are only making query time changes, I would really like to not
> >>> have to reindex our entire collection. Is that possible somehow?
> >>>
> >>> Thanks,
> >>> Bjarke
> >>>
> >>>
> >>> <fieldType name="country_codes" class="solr.TextField"
> >>> sortMissingLast="true" positionIncrementGap="100">
> >>>   <analyzer>
> >>>       <tokenizer class="solr.KeywordTokenizerFactory"/>  <!-- no
> >>> splitting of input -->
> >>>   </analyzer>
> >>>   <analyzer type="query">
> >>>       <tokenizer class="solr.KeywordTokenizerFactory"/>
> >>>       <filter class="solr.SynonymGraphFilterFactory"
> >>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
> >>>   </analyzer>
> >>> </fieldType>
> >>
> >>
>
>

Re: Query-time synonyms without indexing

Posted by Erick Erickson <er...@gmail.com>.

Not sure. You have an
<analyzer>
section and 
<analyzer=“query”>

section. Frankly I’m not sure which one will be used for the index-time chain.

Why don’t you just try it?
change
<analyzer>
to 
<analyzer=“index”>

reload and go. It’d take you 5 minutes and you’d have your answer.

Best,
Erick


> On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen <mo...@eluence.com> wrote:
> 
> Yes, but isn't that what I am already doing in this case (look at the
> fieldType in the original mail)?
> Is there some other way to specify that field type and achieve what I want?
> 
> Thanks,
> Bjarke
> 
> On Tue, Aug 27, 2019, 17:32 Erick Erickson <er...@gmail.com> wrote:
> 
>> You can have separate index and query time analysis chains, there are many
>> examples in the stock Solr schemas.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
>> mortensen@eluence.com> wrote:
>>> 
>>> We have a solr file of type "string".
>>> It turns out that we need to do synonym expansion on query time in order
>> to
>>> account for some changes over time in the values stored in that field.
>>> 
>>> So we have tried introducing a custom fieldType that applies the synonym
>>> filter at query time only (see bottom of mail), but that requires us to
>>> change the field. But now, when we index new documents, Solr complains:
>>> 400 Bad Request
>>> Error: 'Exception writing document id someid to the index; possible
>>> analysis error: cannot change field "auth_country_code" from index
>>> options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
>>> 
>>> Since we are only making query time changes, I would really like to not
>>> have to reindex our entire collection. Is that possible somehow?
>>> 
>>> Thanks,
>>> Bjarke
>>> 
>>> 
>>> <fieldType name="country_codes" class="solr.TextField"
>>> sortMissingLast="true" positionIncrementGap="100">
>>>   <analyzer>
>>>       <tokenizer class="solr.KeywordTokenizerFactory"/>  <!-- no
>>> splitting of input -->
>>>   </analyzer>
>>>   <analyzer type="query">
>>>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>       <filter class="solr.SynonymGraphFilterFactory"
>>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>>>   </analyzer>
>>> </fieldType>
>> 
>>

Re: Query-time synonyms without indexing

Posted by Bjarke Buur Mortensen <mo...@eluence.com>.

Yes, but isn't that what I am already doing in this case (look at the
fieldType in the original mail)?
Is there some other way to specify that field type and achieve what I want?

Thanks,
Bjarke

On Tue, Aug 27, 2019, 17:32 Erick Erickson <er...@gmail.com> wrote:

> You can have separate index and query time analysis chains, there are many
> examples in the stock Solr schemas.
>
> Best,
> Erick
>
> > On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
> mortensen@eluence.com> wrote:
> >
> > We have a solr file of type "string".
> > It turns out that we need to do synonym expansion on query time in order
> to
> > account for some changes over time in the values stored in that field.
> >
> > So we have tried introducing a custom fieldType that applies the synonym
> > filter at query time only (see bottom of mail), but that requires us to
> > change the field. But now, when we index new documents, Solr complains:
> > 400 Bad Request
> > Error: 'Exception writing document id someid to the index; possible
> > analysis error: cannot change field "auth_country_code" from index
> > options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
> >
> > Since we are only making query time changes, I would really like to not
> > have to reindex our entire collection. Is that possible somehow?
> >
> > Thanks,
> > Bjarke
> >
> >
> >  <fieldType name="country_codes" class="solr.TextField"
> > sortMissingLast="true" positionIncrementGap="100">
> >    <analyzer>
> >        <tokenizer class="solr.KeywordTokenizerFactory"/>  <!-- no
> > splitting of input -->
> >    </analyzer>
> >    <analyzer type="query">
> >        <tokenizer class="solr.KeywordTokenizerFactory"/>
> >        <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
> >    </analyzer>
> >  </fieldType>
>
>

Re: Query-time synonyms without indexing

Posted by Erick Erickson <er...@gmail.com>.

You can have separate index and query time analysis chains, there are many examples in the stock Solr schemas.

Best,
Erick

> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <mo...@eluence.com> wrote:
> 
> We have a solr file of type "string".
> It turns out that we need to do synonym expansion on query time in order to
> account for some changes over time in the values stored in that field.
> 
> So we have tried introducing a custom fieldType that applies the synonym
> filter at query time only (see bottom of mail), but that requires us to
> change the field. But now, when we index new documents, Solr complains:
> 400 Bad Request
> Error: 'Exception writing document id someid to the index; possible
> analysis error: cannot change field "auth_country_code" from index
> options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
> 
> Since we are only making query time changes, I would really like to not
> have to reindex our entire collection. Is that possible somehow?
> 
> Thanks,
> Bjarke
> 
> 
>  <fieldType name="country_codes" class="solr.TextField"
> sortMissingLast="true" positionIncrementGap="100">
>    <analyzer>
>        <tokenizer class="solr.KeywordTokenizerFactory"/>  <!-- no
> splitting of input -->
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory"
> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>    </analyzer>
>  </fieldType>