You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Will Johnson (JIRA)" <ji...@apache.org> on 2007/12/26 19:37:43 UTC

[jira] Created: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
--------------------------------------------------------------------

                 Key: SOLR-445
                 URL: https://issues.apache.org/jira/browse/SOLR-445
             Project: Solr
          Issue Type: Bug
          Components: update
    Affects Versions: 1.3
            Reporter: Will Johnson


Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:

<add>
  <doc>
    <field name="id">1</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="myDateField">I_AM_A_BAD_DATE</field>
  </doc>
  <doc>
    <field name="id">3</field>
  </doc>
</add>

Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554452 ] 

Ryan McKinley commented on SOLR-445:
------------------------------------

yup... as is, it keeps anything it had before it fails.

Check http://wiki.apache.org/solr/UpdateRequestProcessor for a way to intercect the standard processing.  One could easily write an UpdateRequestProcessor to make sure *all* documents are valid before tryinig to commit them.  (that is queue everything up until finish())

Another approach may be to keep track of the docs that were added correctly, then remove them if something goes wrong....

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557681#action_12557681 ] 

Grant Ingersoll commented on SOLR-445:
--------------------------------------

Is it reasonable to use the AddUpdateCommand to communicate out of the UpdateHandler that a given document failed?  For instance, in the update handler, it could catch any exception, and then add that exception onto the command (the next reuse would have to reset it) and then the various RequestHandler (XML/CSV) can check to see if the exception is set, add it to a list of failed docs and then add the failed docs to the response, which can then be written out as needed by the writers?

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Assignee: Grant Ingersoll
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-445:
------------------------------------

    Assignee:     (was: Grant Ingersoll)

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>             Fix For: 1.5
>
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556898#action_12556898 ] 

Grant Ingersoll commented on SOLR-445:
--------------------------------------

{quote}
Option 2 seems like a good approach... return a list of documents that failed, why they failed, etc, and that could be logged for a human to check later.
{quote}

+1

And it should not just be for the XML version, right?  Anything that can take a batch of documents should be able to return a list of those that failed along with the reason they failed.

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-445:
------------------------------------

    Assignee: Grant Ingersoll

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Assignee: Grant Ingersoll
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554466 ] 

Yonik Seeley commented on SOLR-445:
-----------------------------------

Option 2 seems like a good approach... return a list of documents that failed, why they failed, etc, and that could be logged for a human to check later.

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-445:
---------------------------------------

    Fix Version/s: 1.4

> XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
> --------------------------------------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API.  I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.