You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2018/02/14 22:57:00 UTC

[jira] [Commented] (SOLR-11988) FullSolrCloudDistribCmdsTest failures due to SolrCore initializating incorrectly thinking index directory already exists?

    [ https://issues.apache.org/jira/browse/SOLR-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364894#comment-16364894 ] 

Hoss Man commented on SOLR-11988:
---------------------------------


Below is an example of the type of execption that gets logged when this test fails, from the log.txt file I just attached (where 2 _different_ replicas had this same problem).

Note that this log.txt file was produced using the attached SOLR-11988_nocommit_logging.patch that increases the logging verbosity in {{SolrCore.initIndex(...)}} -- hence the "nocommit" log line, and the line numbers not matching up exactly with current master -- but the net result is the same: In {{SolrCore.initIndex(...)}} the DirectoryFactory claims that the index directory for this brand new, never before in existence SolrCore, already exists and doesn't need to be initialized.  This then causes a problem when we try to open the "real" IndexWriter against it (using OpenMode.APPEND because we expect it to already exist)...

{noformat}
$ ant test  -Dtestcase=FullSolrCloudDistribCmdsTest -Dtests.method=test -Dtests.seed=E6FD3BCDEA5D2094 -Dtests.slow=true -Dtests.locale=ar-JO -Dtests.timezone=Asia/Aqtobe -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
...
   [junit4]   2> 33926 INFO  (qtp1926432793-173) [n:127.0.0.1:60391_kg_fmt c:collection2 s:shard6  x:collection2_shard6_replica_n32] o.a.s.c.SolrCore [collection2_shard6_replica_n32] nocommit: skipping creation of '/home/hossman/lucene/dev/solr/build/solr-core/test/J0/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.cloud.FullSolrCloudDistribCmdsTest_E6FD3BCDEA5D2094-001/shard-4-001/cores/collection2_shard6_replica_n32/data/index/' (aka: '/home/hossman/lucene/dev/solr/build/solr-core/test/J0/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.cloud.FullSolrCloudDistribCmdsTest_E6FD3BCDEA5D2094-001/shard-4-001/cores/collection2_shard6_replica_n32/data/index') because dirFac (org.apache.solr.core.MockDirectoryFactory@768117c) says it exists
...
   [junit4]   2> 34763 ERROR (qtp1926432793-173) [n:127.0.0.1:60391_kg_fmt c:collection2 s:shard6  x:collection2_shard6_replica_n32] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error CREATEing SolrCore 'collection2_shard6_replica_n32': Unable to create core [collection2_shard6_replica_n32] Caused by: no segments* file found in LockValidatingDirectoryWrapper(MockDirectoryWrapper(RAMDirectory@63c826a3 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@17cdc1)): files: []
   [junit4]   2>        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
   [junit4]   2>        at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:91)
...
   [junit4]   2> Caused by: org.apache.solr.common.SolrException: Unable to create core [collection2_shard6_replica_n32]
   [junit4]   2>        at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1059)
   [junit4]   2>        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:954)
   [junit4]   2>        ... 39 more
   [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   [junit4]   2>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1013)
   [junit4]   2>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:868)
   [junit4]   2>        at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1043)
   [junit4]   2>        ... 40 more
   [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   [junit4]   2>        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2100)
   [junit4]   2>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2220)
   [junit4]   2>        at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1096)
   [junit4]   2>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:985)
   [junit4]   2>        ... 42 more
   [junit4]   2> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in LockValidatingDirectoryWrapper(MockDirectoryWrapper(RAMDirectory@63c826a3 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@17cdc1)): files: []
   [junit4]   2>        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1072)
   [junit4]   2>        at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:119)
   [junit4]   2>        at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:94)
   [junit4]   2>        at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:257)
   [junit4]   2>        at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:131)
   [junit4]   2>        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2061)
   [junit4]   2>        ... 45 more
{noformat}






> FullSolrCloudDistribCmdsTest failures due to SolrCore initializating incorrectly thinking index directory already exists?
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11988
>                 URL: https://issues.apache.org/jira/browse/SOLR-11988
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-11988_nocommit_logging.patch, log.txt
>
>
> There's been quite a few jenkins failures from FullSolrCloudDistribCmdsTest that all seem to follow a similar pattern:
>  * Failure manifests as "Could not find collection:collection2"
>  * Failing seeds _frequently_ reproduce, but aren't guaranteed to
>  * Root cause can be traced back to the collection creation failing because one of more replica cores failed due to the brand new (Solr)IndexWriter expects to find an existing segments file
>  ** SolrCore should have already created an (empty) index in {{SolrCore.initIndex(...)}}
>  ** The fact that the {{SolrIndexWrite}} throws this exception in it's constructor suggests that the earlier call to {{SolrCore.initIndex(...)}} is not functioning reliably
>  ** Based on some experimenting i've done, it seems like the underlying problem is that in {{SolrCore.initIndex(...)}} the DirectoryFactory can "lie" about wether a directory already exists.
> More details to follow in comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org