You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gnandre <ar...@gmail.com> on 2020/07/03 01:16:39 UTC
Re: Solr 8.5.2 indexing issue
It seems that the issue is not with reference_url field itself. There is
one copy field which has the reference_url field as source and another
field called url_path as destination.
This destination field url_path has the following field type definition.
<fieldType name="url_path_text" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory"
pattern="(https?://(www\.([^/]+)?)?|/([^/]+\.[^/]+$)?|\.?organization\.[^/]+|[?#].*$)"
group="-1"/>
<filter class="solr.WordDelimiterGraphFilterFactory"
protected="protect.txt" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose"/>
<filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_en.txt" ignoreCase="true" expand="false"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"
protected="protwords.txt" language="English"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory"
protected="protect.txt" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose"/>
<filter class="solr.SnowballPorterFilterFactory"
protected="protwords.txt" language="English"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
If I remove SynonymGraphFilterFactory and FlattenGraphFilterFactory in
above field type definition then it works otherwise it throws the
same error (IndexOutOfBoundsException) .
On Sun, Jun 28, 2020 at 9:06 AM Erick Erickson <er...@gmail.com>
wrote:
> How are you sending this to Solr? I just tried 8.5, submitting that doc
> through the admin UI and it works fine.
> I defined “asset_id” with as the same type as your reference_url field.
>
> And does the log on the Solr node that tries to index this give any more
> info?
>
> Best,
> Erick
>
> > On Jun 27, 2020, at 10:45 PM, gnandre <ar...@gmail.com> wrote:
> >
> > {
> > "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
> >
> >
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}
>
>