You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2017/03/07 00:16:32 UTC

[jira] [Created] (SOLR-10234) "Too many open files" in distrib tests due to fixed HandleLimitFS (regardless of num nodes in test)

Hoss Man created SOLR-10234:
-------------------------------

             Summary: "Too many open files" in distrib tests due to fixed HandleLimitFS (regardless of num nodes in test)
                 Key: SOLR-10234
                 URL: https://issues.apache.org/jira/browse/SOLR-10234
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man


I just got an failure from BasicDistributedZkTest on master (acb185b2dc7522e6a4fa55d54e82910736668f8d) that caught my attention -- the reported failure was "Remote error message: Exception writing document id 57 to the index; possible analysis error.", but digging intothe logs the root cause was "Too many open files" coming from the mock
{{HandleLimitFS}} class we have...

{noformat}

   [junit4]   2> 495598 ERROR (qtp155652658-4405) [    ] o.a.s.h.RequestHandlerBase java.nio.file.FileSystemException: /home/jenkins/lucene-solr/solr/build/solr-core/test/J1/temp/solr.cloud.BasicDistributedZkTest_8D04773C07230D3B-001/index-NIOFSDirectory-002/_o_Memory_0.mdvm: Too many open files
   [junit4]   2> 	at org.apache.lucene.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:48)
   [junit4]   2> 	at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
   [junit4]   2> 	at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:160)
   [junit4]   2> 	at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
   [junit4]   2> 	at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413)
   [junit4]   2> 	at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409)
   [junit4]   2> 	at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
   [junit4]   2> 	at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
...
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BasicDistributedZkTest -Dtests.method=test -Dtests.seed=8D04773C07230D3B -Dtests.slow=true -Dtests.locale=en-ER -Dtests.timezone=Europe/Volgograd -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR    259s J1 | BasicDistributedZkTest.test <<<
{noformat}

...what concerns me in particular about this is is that it's coming from a distributed test, involving many multiple "nodes" (all using the same randomized similarity) writting to the same "file://" filesystem in the same JVM -- but {{TestRuleTemporaryFilesCleanup}} seems to be initializing the filesystem with a fixed {{MAX_OPEN_FILES = 2048}}

So perhaps all (distributed/cloud) Solr tests should use {{SuppressFileSystems}} to ensure we don't get false failures like this?

Or perhaps we should enhance the way we use {{HandleLimitFS}} in our test scaffolding so that we can give each solr node it's own mock filesystem? (with it's own MAX_OPEN_FILES limit?)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org