You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Anthony Fernandes <an...@slalom.com.INVALID> on 2021/12/27 18:23:03 UTC

Solr Cloud not indexing a field containing HTML (Production only)

Hello,

I have a field whose type is set to Rich text in Sitecore. On lower environments (local and PreProduction-OnPrem), the values get indexed correctly and HTML text is stored in Solr correctly.
On Production (Solr Cloud), the HTML gets stripped off completely for this field.
One difference is that in lower environments, we have Solr on Prem and on Production, it is Solr cloud
I have checked the CM and CD servers and all have the field reader for the Body Copy field
What could the issue be?
It is only happening on Production
I have validated the config is as expected. The field is Body Copy
   <fieldReaders type="Sitecore.ContentSearch.FieldReaders.FieldReaderMap, Sitecore.ContentSearch">
     <param desc="id">defaultFieldReaderMap</param>
     <mapFieldByTypeName hint="raw:AddFieldReaderByFieldTypeName">
       <fieldReader fieldTypeName="html|rich text"                                       fieldReaderType="Sitecore.ContentSearch.FieldReaders.RichTextFieldReader, Sitecore.ContentSearch" />
     </mapFieldByTypeName>
           <mapFieldByFieldName hint="raw:AddFieldReaderByFieldName">
               <fieldReader fieldName="Body Copy" fieldReaderType="Sitecore.ContentSearch.FieldReaders.DefaultFieldReader, Sitecore.ContentSearch" />
           </mapFieldByFieldName>
   </fieldReaders>
It is only happening for some content and not for the rest
I have resolved the HTML errors in the fields that reported HTML errors but that didn't fix it either.

Thanks,
Anthony


Re: Solr Cloud not indexing a field containing HTML (Production only)

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Compare your local solrconfig.xml with production's one, ideally by using
config API to avoid comparing wrong ones or - for production - a local copy
mismatching one actually in zookeeper.

Look for endpoints used (/update? /extract?) , request handlers defined and
maybe update request process chains. And whether Tika libraries are defined
properly. I assume that's what is used for HTML extraction.

Regards,
    Alex


On Mon., Dec. 27, 2021, 5:49 p.m. Anthony Fernandes,
<an...@slalom.com.invalid> wrote:

> Hello,
>
> I have a field whose type is set to Rich text in Sitecore. On lower
> environments (local and PreProduction-OnPrem), the values get indexed
> correctly and HTML text is stored in Solr correctly.
> On Production (Solr Cloud), the HTML gets stripped off completely for this
> field.
> One difference is that in lower environments, we have Solr on Prem and on
> Production, it is Solr cloud
> I have checked the CM and CD servers and all have the field reader for the
> Body Copy field
> What could the issue be?
> It is only happening on Production
> I have validated the config is as expected. The field is Body Copy
>    <fieldReaders type="Sitecore.ContentSearch.FieldReaders.FieldReaderMap,
> Sitecore.ContentSearch">
>      <param desc="id">defaultFieldReaderMap</param>
>      <mapFieldByTypeName hint="raw:AddFieldReaderByFieldTypeName">
>        <fieldReader fieldTypeName="html|rich text"
>
>  fieldReaderType="Sitecore.ContentSearch.FieldReaders.RichTextFieldReader,
> Sitecore.ContentSearch" />
>      </mapFieldByTypeName>
>            <mapFieldByFieldName hint="raw:AddFieldReaderByFieldName">
>                <fieldReader fieldName="Body Copy"
> fieldReaderType="Sitecore.ContentSearch.FieldReaders.DefaultFieldReader,
> Sitecore.ContentSearch" />
>            </mapFieldByFieldName>
>    </fieldReaders>
> It is only happening for some content and not for the rest
> I have resolved the HTML errors in the fields that reported HTML errors
> but that didn't fix it either.
>
> Thanks,
> Anthony
>
>