You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gnandre <ar...@gmail.com> on 2020/07/03 01:16:39 UTC

Re: Solr 8.5.2 indexing issue

It seems that the issue is not with reference_url field itself. There is
one copy field which has the reference_url field as source and another
field called url_path as destination.
This destination field url_path has the following field type definition.

  <fieldType name="url_path_text" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.PatternTokenizerFactory"
pattern="(https?://(www\.([^/]+)?)?|/([^/]+\.[^/]+$)?|\.?organization\.[^/]+|[?#].*$)"
group="-1"/>
      <filter class="solr.WordDelimiterGraphFilterFactory"
protected="protect.txt" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0"  splitOnCaseChange="1"/>
 <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose"/>
      <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_en.txt" ignoreCase="true" expand="false"/>
 <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory"
protected="protwords.txt" language="English"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
      <filter class="solr.WordDelimiterGraphFilterFactory"
protected="protect.txt" preserveOriginal="1" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0"  splitOnCaseChange="1"/>
 <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose"/>
      <filter class="solr.SnowballPorterFilterFactory"
protected="protwords.txt" language="English"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
  </fieldType>

If I remove  SynonymGraphFilterFactory and FlattenGraphFilterFactory in
above field type definition then it works otherwise it throws the
same error (IndexOutOfBoundsException) .

On Sun, Jun 28, 2020 at 9:06 AM Erick Erickson <er...@gmail.com>
wrote:

> How are you sending this to Solr? I just tried 8.5, submitting that doc
> through the admin UI and it works fine.
> I defined “asset_id” with as the same type as your reference_url field.
>
> And does the log on the Solr node that tries to index this give any more
> info?
>
> Best,
> Erick
>
> > On Jun 27, 2020, at 10:45 PM, gnandre <ar...@gmail.com> wrote:
> >
> > {
> >        "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
> >
> >
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}
>
>