You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by tedsolr <ts...@sciquest.com> on 2014/07/01 22:37:12 UTC

Continue indexing doc after error

I need to index documents from a csv file that will have 1000s of rows and
100+ columns. To help the user loading the file I must return useful errors
when indexing fails (schema violations). I'm using SolrJ to read the files
line by line, build the document, and index/commit. This approach allows me
to index the docs that have no schema validation errors, skipping over the
docs that do. However, I really want to report errors field by field. As the
user makes corrections to the file, this would prevent the same doc from
failing multiple times if there are several fields that are busted.I have
not seen a configuration setting that tells solr to keep indexing the doc
after it encounters the first error, reporting back all the field errors
(multiple exceptions). Does anyone know if that's possible?Using Solr 4.8.1



--
View this message in context: http://lucene.472066.n3.nabble.com/Continue-indexing-doc-after-error-tp4145081.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Continue indexing doc after error

Posted by tedsolr <ts...@sciquest.com>.

Thank you. That's a useful link. Maybe not quite what I'm looking for, as it
appears to do with bulk loads of docs - returning an error for each bad doc.
My question is more about getting all the errors for a single doc. I'm
probably taking a performance hit by adding docs one at a time. I haven't
tested super big files yet (1M+ rows).





--
View this message in context: http://lucene.472066.n3.nabble.com/Continue-indexing-doc-after-error-tp4145081p4145087.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Continue indexing doc after error

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

I think what you want is what’s described in
https://issues.apache.org/jira/browse/SOLR-445 This has not been committed
because it still doesn’t work with SolrCloud. Hoss gave me the hint to look
at DistributingUpdateProcessorFactory to solve the problem described in the
last comments, but I haven’t had time to get back to this yet.


On Tue, Jul 1, 2014 at 1:37 PM, tedsolr <ts...@sciquest.com> wrote:

> I need to index documents from a csv file that will have 1000s of rows and
> 100+ columns. To help the user loading the file I must return useful errors
> when indexing fails (schema violations). I'm using SolrJ to read the files
> line by line, build the document, and index/commit. This approach allows me
> to index the docs that have no schema validation errors, skipping over the
> docs that do. However, I really want to report errors field by field. As
> the
> user makes corrections to the file, this would prevent the same doc from
> failing multiple times if there are several fields that are busted.I have
> not seen a configuration setting that tells solr to keep indexing the doc
> after it encounters the first error, reporting back all the field errors
> (multiple exceptions). Does anyone know if that's possible?Using Solr 4.8.1
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Continue-indexing-doc-after-error-tp4145081.html
> Sent from the Solr - User mailing list archive at Nabble.com.