You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Iain Lopata <ia...@ameritech.net> on 2013/05/30 19:03:04 UTC

Continue Indexing Documents when single doc does not match schema

I am using Nutch 1.6 and Solr 1.4.1 on Ubuntu in local mode and using
Nutch's solrindex to index documents into Solr.

 

When indexing documents, I hit an occasional document that does not match
the Solr schema.  For example, a document which has two address fields when
my Solr schema.xml does not specify address as being multi-valued (and I do
not want it to be).  Ideally, I would like this document to be skipped, an
error written to the log file for later investigation, and the indexing of
the remainder of the parsed documents to continue.  Instead the job fails.

 

I have tried setting
<abortOnConfigurationError>${solr.abortOnConfigurationError:false}</abortOnC
onfigurationError> in solrconfig.xml and restarting tomcat, but that does
not seem to make a difference.

 

Where else should I be looking?

 


Re: Continue Indexing Documents when single doc does not match schema

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/30/2013 11:03 AM, Iain Lopata wrote:
> When indexing documents, I hit an occasional document that does not match
> the Solr schema.  For example, a document which has two address fields when
> my Solr schema.xml does not specify address as being multi-valued (and I do
> not want it to be).  Ideally, I would like this document to be skipped, an
> error written to the log file for later investigation, and the indexing of
> the remainder of the parsed documents to continue.  Instead the job fails.
>
> I have tried setting
> <abortOnConfigurationError>${solr.abortOnConfigurationError:false}</abortOnC
> onfigurationError> in solrconfig.xml and restarting tomcat, but that does
> not seem to make a difference.

That config option just tells Solr whether or not initial startup should 
fail if there's a configuration error in config files like 
solrconfig.xml.  In most cases, you want it to be true.

I don't think anything currently exists to do what you want.  The 
feature request issue has been around for a long time, and it's had some 
relatively recent activity, at least compared to its creation date:

https://issues.apache.org/jira/browse/SOLR-445

I haven't looked at the patch, but I would imagine that it just needs to 
be updated for the many source code changes since it was created, then 
examined to make sure it's correctly implemented.

Thanks,
Shawn