You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2012/06/07 17:03:23 UTC
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents

    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291055#comment-13291055 ] 

Yonik Seeley commented on SOLR-445:
-----------------------------------

I imagine a maxErrors parameter might be useful (and more readable than abortOnFirstBatchIndexError)

maxErrors=0 (the current behavior - stop processing more updates when we hit an error)
maxErrors=10 (allow up to 10 documents to fail before aborting the update... useful for true bulk uploading where you want to allow for an isolated failure or two, but still want to stop if every single update is failing because something is configured wrong)
maxErrors=-1 (allow an unlimited number of documents to fail)

Making updates transactional seems really tough in cloud mode since we don't keep old versions of documents around... although it might be possible for a short time with the transaction log.  Anyway, that should definitely be a separate issue.

A couple of other notes:
 - structured error responses were recently added in 4.0-dev that should make this issue easier in general.  Example:
{code}
{"responseHeader":{"status":400,"QTime":0},"error":{"msg":"ERROR: [doc=mydoc] unknown field 'foo'","code":400}}
{code}
 - Per did some error handling work that's included in a patch attached to SOLR-3178
                
> Update Handlers abort with bad documents
> ----------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>             Fix For: 4.1
>
>         Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org