You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Kevin Osborn (Created) (JIRA)" <ji...@apache.org> on 2012/04/17 19:29:18 UTC

[jira] [Created] (SOLR-3366) Restart of Solr during data import causes an empty index to be generated on restart

Restart of Solr during data import causes an empty index to be generated on restart
-----------------------------------------------------------------------------------

                 Key: SOLR-3366
                 URL: https://issues.apache.org/jira/browse/SOLR-3366
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler, replication (java)
    Affects Versions: 3.4
            Reporter: Kevin Osborn


We use the DataImportHandler and Java replication in a fairly simple setup of a single master and 4 slaves. We had an operating index of about 16,000 documents. The DataImportHandler is pulled periodically by an external service using the "command=full-import&clean=false" command for a delta import.

While processing one of these commands, we did a deployment which required us to restart the application server (Tomcat 7). So, the import was interrupted. Prior to this deployment, the full index of 16,000 documents had been replicated to all slaves and was working correctly.

Upon restart, the master restarted with an empty index and then this empty index was replicated across all slaves. So, our search index was now empty.

My expected behavior was to lose any changes in the delta import (basically prior to the commit). However, I was not expecting to lose all data. Perhaps this is due to the fact that I am using the full-import method, even though it is really a delta, for performance reasons? Or does the data import just put the index in some sort of invalid state?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3366) Restart of Solr during data import causes an empty index to be generated on restart

Posted by "James Dyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255827#comment-13255827 ] 

James Dyer commented on SOLR-3366:
----------------------------------

I don't see how this would be related to DIH.  Even if you had "clean=true", it doesn't commit the deletes until the entire update is complete.  So, like you say, we should expect to only lose the changes from the current import, not the entire index.

I wonder if this is a side-effect from using replication.  Sometimes, replication copies an entire new index to the slaves in a new directory, then writes this new directory to "index.properties".  On restart solr looks for "index.properties" to find the appropriate index directory.  If this file had been touched or removed, possibly it restarted and didn't find the correct directory, then created a new index?  Of course, this would have affected the slaves only.

I vaguely remember there being a bug some releases back where index corruption could occur if the system is ungracefully shut down, and I see you're on 3.4.  But then again, maybe my memory is failing me because I didn't see this in the release notes.
                
> Restart of Solr during data import causes an empty index to be generated on restart
> -----------------------------------------------------------------------------------
>
>                 Key: SOLR-3366
>                 URL: https://issues.apache.org/jira/browse/SOLR-3366
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, replication (java)
>    Affects Versions: 3.4
>            Reporter: Kevin Osborn
>
> We use the DataImportHandler and Java replication in a fairly simple setup of a single master and 4 slaves. We had an operating index of about 16,000 documents. The DataImportHandler is pulled periodically by an external service using the "command=full-import&clean=false" command for a delta import.
> While processing one of these commands, we did a deployment which required us to restart the application server (Tomcat 7). So, the import was interrupted. Prior to this deployment, the full index of 16,000 documents had been replicated to all slaves and was working correctly.
> Upon restart, the master restarted with an empty index and then this empty index was replicated across all slaves. So, our search index was now empty.
> My expected behavior was to lose any changes in the delta import (basically prior to the commit). However, I was not expecting to lose all data. Perhaps this is due to the fact that I am using the full-import method, even though it is really a delta, for performance reasons? Or does the data import just put the index in some sort of invalid state?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org