You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "damian.pawski" <dp...@gmail.com> on 2018/03/05 09:56:48 UTC

Solr SynonymGraphFilterFactory error on import

After upgrading to Solr 7.2 import started to log errors for some documents.

Field that returns errors:

   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
    <filter class="solr.FlattenGraphFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" /> 
    <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
    <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>
During the import below error is returned for some of the records:

org.apache.solr.common.SolrException: Exception writing document id XXXXX to
the index; possible analysis error: startOffset must be non-negative, and
endOffset must be >= startOffset, and offsets must not go backwards
startOffset=2874,endOffset=2878,lastStartOffset=2879 for field 'XXXXX'
at
g.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:226)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:936)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:616)
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)


It is related to the:
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.FlattenGraphFilterFactory"/>

If I remove this it works fine, previously we were using:
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>

and it was working fine, but the SynonymFilterFactory is not longer
supported on the Solr 7.X., it has been replaced with
SynonymGraphFilterFactory, I have added FlattenGraphFilterFactory as
suggested.

I am not sure why Solr returns those errors?

Thank you in advance for suggestions.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr SynonymGraphFilterFactory error on import

Posted by "damian.pawski" <dp...@gmail.com>.

"/You probably want to call  solr.FlattenGraphFilterFactory  after the call 
to  WordDelimiterGraphFilterFactory. I put it at the end/ "  

That solved my issue 

Thank you 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr SynonymGraphFilterFactory error on import

Posted by Webster Homer <we...@sial.com>.

You probably want to call  solr.FlattenGraphFilterFactory  after the call
to  WordDelimiterGraphFilterFactory. I put it at the end

Also there is an issue calling more than one graph filter in an analysis
chain so you may need to remove one of them. I think that there is a Jira
about that

Personally I prefer to do synonyms at query time.



On Mon, Mar 5, 2018 at 3:56 AM, damian.pawski <dp...@gmail.com> wrote:

> After upgrading to Solr 7.2 import started to log errors for some
> documents.
>
> Field that returns errors:
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>   <analyzer type="index">
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>     <filter class="solr.FlattenGraphFilterFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>     <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPossessiveFilterFactory"/>
>     <filter class="solr.ASCIIFoldingFilterFactory"/>
>     <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>     <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EnglishPossessiveFilterFactory"/>
>     <filter class="solr.ASCIIFoldingFilterFactory"/>
>     <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
> </fieldType>
> During the import below error is returned for some of the records:
>
> org.apache.solr.common.SolrException: Exception writing document id XXXXX
> to
> the index; possible analysis error: startOffset must be non-negative, and
> endOffset must be >= startOffset, and offsets must not go backwards
> startOffset=2874,endOffset=2878,lastStartOffset=2879 for field 'XXXXX'
> at
> g.apache.solr.update.DirectUpdateHandler2.addDoc(
> DirectUpdateHandler2.java:226)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(
> RunUpdateProcessorFactory.java:67)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(
> UpdateRequestProcessor.java:55)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(
> DistributedUpdateProcessor.java:936)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(
> DistributedUpdateProcessor.java:616)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$
> LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> at org.apache.solr.handler.dataimport.SolrWriter.upload(
> SolrWriter.java:80)
>
>
> It is related to the:
> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.FlattenGraphFilterFactory"/>
>
> If I remove this it works fine, previously we were using:
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
> and it was working fine, but the SynonymFilterFactory is not longer
> supported on the Solr 7.X., it has been replaced with
> SynonymGraphFilterFactory, I have added FlattenGraphFilterFactory as
> suggested.
>
> I am not sure why Solr returns those errors?
>
> Thank you in advance for suggestions.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.