You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Austin Rasmussen <AR...@directs.com> on 2013/09/05 19:21:39 UTC

Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Hello,

I currently have Solr 4.3 set up with about 400 cores set to load upon start up.  When starting Solr with an empty index for each core, Solr is able to load all of the cores and start up normally as expected.  However, after running a dataimport on all cores and restarting Solr, it hangs at "org.apache.solr.core.CoreContainer; registering core: ..." without any type of error message in the log.  The process still exists at this point, but doesn't make any progress even if left for a period of time.  Prior to the restart, Solr continues to function normally, and is searchable.

Solr is currently running in master-slave replication, and this same, exact behavior occurs on the master and both slaves.

I've checked all of the system log files and am also unable to find any errors or messages that would point to a particular problem.  Originally, I had thought it may have been related to an open file limit, but I also tried raising the limit to 65k, and Solr continued to hang at the same spot.  It does appear to be related to files to an extent, since removing the index/"data" directory of half of the cores does allow Solr to start up normally.

Any help or suggestions are appreciated.

Thanks!

Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Jonatan Fournier <jo...@gmail.com>.
Hello,

I still have this issue using Solr 4.4, removing firstSearcher queries did
make the problem go away.

Note that I'm using Tomcat 7 and that if I'm using my own Java application
launching an Embedded Solr Server pointing to the same Solr configuration
the server fully starts with no hang.

What is the xml tag syntax to have spellcheck=false for firstSearcher
discussed above?

Cheers,

/jonatan

--- HANG with Tomcat 7 (firstSearcher queries on) ---
<...>
2409 [coreLoadExecutor-3-thread-3] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – No queryConverter
defined, using default converter
2409 [coreLoadExecutor-3-thread-3] INFO
 org.apache.solr.handler.component.QueryElevationComponent  – Loading
QueryElevation from: /var/lib/myapp/conf/elevate.xml
2415 [coreLoadExecutor-3-thread-3] INFO
 org.apache.solr.handler.ReplicationHandler  – Commits will be reserved for
 10000
2415 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener sending requests to
Searcher@5c43ecf0main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}
2417 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
[foo-20130912] webapp=null path=null
params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false}
hits=0 status=0 QTime=1
2417 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener done.
2417 [searcherExecutor-16-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: default
2417 [searcherExecutor-16-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: wordbreak
2418 [searcherExecutor-16-thread-1] INFO  org.apache.solr.core.SolrCore  –
[foo-20130912] Registered new searcher
Searcher@5c43ecf0main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}
2420 [coreLoadExecutor-3-thread-3] INFO  org.apache.solr.core.CoreContainer
 – registering core: foo-20130912

--- NO HANG EmbeddedSolrServer (firstSearcher queries on) ---
<...>
1797 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – No queryConverter
defined, using default converter
1797 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.handler.component.QueryElevationComponent  – Loading
QueryElevation from: /var/lib/myapp/conf/elevate.xml
1800 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.handler.ReplicationHandler  – Commits will be reserved for
 10000
1801 [searcherExecutor-15-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener sending requests to
Searcher@27b104d7main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}
1801 [searcherExecutor-15-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener done.
1801 [searcherExecutor-15-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: default
1801 [coreLoadExecutor-3-thread-1] INFO  org.apache.solr.core.CoreContainer
 – registering core: foo-20130912
1801 [searcherExecutor-15-thread-1] INFO
 org.apache.solr.handler.component.SpellCheckComponent  – Loading spell
index for spellchecker: wordbreak
1801 [searcherExecutor-15-thread-1] INFO  org.apache.solr.core.SolrCore  –
[foo-20130912] Registered new searcher
Searcher@27b104d7main{StandardDirectoryReader(segments_3:23
_9(4.4):C57862)}


On Fri, Sep 6, 2013 at 4:29 PM, Austin Rasmussen <AR...@directs.com>wrote:

> : Do all of your cores have "newSearcher" event listners configured or just
> : 2 (i'm trying to figure out if it's a timing fluke that these two are
> stalled, or if it's something special about the configs)
>
> All of my cores have both the "newSearcher" and "firstSearcher" event
> listeners configured. (The firstSearcher actually doesn't have any queries
> configured against it, so it probably should just be removed altogether)
>
> : Can you try removing the newSearcher listners to confirm that that does
> in fact make the problem go away?
>
> Removing the "newSearcher" listeners does not make the problem go away;
> however, removing the "firstSearcher" listener (even if the "newSearcher"
> listener is still configured) does make the problem go away.
>
> : With the newSearcher listeners in place, Can you try setting
> "spellcheck=false" as a query param on the newSearcher listeners you have
> configured and
> : see if that works arround the problem?
>
> Adding the "spellcheck=false" param to the "firstSearcher" listener does
> appear to work around the problem.
>
> : Assuming it's just 2 cores using these listeners: can you reproduce this
> problem with a simpler seup where only one of the affected cores is in use?
>
> Since it's not just these two cores, I'm not sure how to produce much of a
> simpler setup.  I did attempt to limit how many cores are loaded in the
> solr.xml, and found that if I cut it down to 56, it was able to load
> successfully (without any of the above config changed).
>
> If I cut it down to 57 cores, it doesn't hang at "registering core" any
> more, it actually gets as far as " QuerySenderListener sending requests to
> Searcher@2f28849 main{StandardDirectoryReader(..."
>
> If 58+ cores are loaded at start up, that's when it begins to hang at
> "registering core".  However, it always hangs on the *last* core configured
> in the solr.xml, regardless of how many cores are being loaded.
>
>
> : can you reproduce using Solr 4.4?
> : It would be helpful if you could create a jira and attach...
> : * your complete configs -- or at least some configs similar to yours
> that are complete enough to reproduce the startup problem.
> : * some sample data (based on
> : your initial description, i'm guessing there at least needs to be a
> handful of docs in the index -- and most likelye they need to match your
> warming query -: - but we don't need your actual indexes, just some docs
> that will work with your configs that we can index
> : & restart to see the problem.
> : * these thread dumps.
>
> I can likely get to this early next week, both checking into how this
> behaves using Solr 4.4 and submitting a JIRA with your requested info.
>

RE: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Austin Rasmussen <AR...@directs.com>.
: Do all of your cores have "newSearcher" event listners configured or just
: 2 (i'm trying to figure out if it's a timing fluke that these two are stalled, or if it's something special about the configs)

All of my cores have both the "newSearcher" and "firstSearcher" event listeners configured. (The firstSearcher actually doesn't have any queries configured against it, so it probably should just be removed altogether)

: Can you try removing the newSearcher listners to confirm that that does in fact make the problem go away?

Removing the "newSearcher" listeners does not make the problem go away; however, removing the "firstSearcher" listener (even if the "newSearcher" listener is still configured) does make the problem go away.

: With the newSearcher listeners in place, Can you try setting "spellcheck=false" as a query param on the newSearcher listeners you have configured and 
: see if that works arround the problem?

Adding the "spellcheck=false" param to the "firstSearcher" listener does appear to work around the problem.

: Assuming it's just 2 cores using these listeners: can you reproduce this problem with a simpler seup where only one of the affected cores is in use?

Since it's not just these two cores, I'm not sure how to produce much of a simpler setup.  I did attempt to limit how many cores are loaded in the solr.xml, and found that if I cut it down to 56, it was able to load successfully (without any of the above config changed).

If I cut it down to 57 cores, it doesn't hang at "registering core" any more, it actually gets as far as " QuerySenderListener sending requests to Searcher@2f28849 main{StandardDirectoryReader(..."

If 58+ cores are loaded at start up, that's when it begins to hang at "registering core".  However, it always hangs on the *last* core configured in the solr.xml, regardless of how many cores are being loaded.


: can you reproduce using Solr 4.4?
: It would be helpful if you could create a jira and attach...
: * your complete configs -- or at least some configs similar to yours that are complete enough to reproduce the startup problem.  
: * some sample data (based on
: your initial description, i'm guessing there at least needs to be a handful of docs in the index -- and most likelye they need to match your warming query -: - but we don't need your actual indexes, just some docs that will work with your configs that we can index 
: & restart to see the problem. 
: * these thread dumps.

I can likely get to this early next week, both checking into how this behaves using Solr 4.4 and submitting a JIRA with your requested info.

RE: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Chris Hostetter <ho...@fucit.org>.
: Sorry for the multi-post, seems like the .tdump files didn't get 
: attached.  I've tried attaching them as .txt files this time.

Interesting ... it looks like 2 of your cores are blocked in loaded while 
waiting for the searchers to open ... not clera if it's a deaklock or why 
though - in both cases the coreLoaderThread is trying to register stuff 
with JMX, which is asking for stats right off the bat (not sure why), 
which requires accessing the searcher and is waiting for that to be 
available.  but then you also have "newSearcher" listener events which 
are using the spellcheck componnent which is blocked waiting for that 
searcher as well.

Do all of your cores have "newSearcher" event listners configured or just 
2 (i'm trying to figure out if it's a timing fluke that these two are 
stalled, or if it's something special about the configs)

Can you try removing the newSearcher listners to confirm that that does in 
fact make the problem go away?

With the newSearcher listeners in place, Can you try setting 
"spellcheck=false" as a query param on the newSearcher listeners you have 
configured and see if that works arround the problem?

Assuming it's just 2 cores using these listeners: can you reproduce this 
problem with a simpler seup where only one of the affected cores is in 
use?

can you reproduce using Solr 4.4?


It would be helpful if you could create a jira and attach...

* your complete configs -- or at least some configs similar to 
yours that are complete enough to reproduce the startup problem.  
* some sample data (based on 
your initial description, i'm guessing there at least needs to be a 
handful of docs in the index -- and most likelye they need to match your 
warming query -- but we don't need your actual indexes, just some docs 
that will work with your configs that we can index & restart to see the 
problem. 
* these thread dumps.


-Hoss

RE: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Austin Rasmussen <AR...@directs.com>.
Sorry for the multi-post, seems like the .tdump files didn't get attached.  I've tried attaching them as .txt files this time.

-----Original Message-----
From: Austin Rasmussen [mailto:ARasmussen@directs.com] 
Sent: Thursday, September 05, 2013 5:09 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Thanks for the reply Hoss.

I'm actually not using the transaction log (or the NRTCachingDirectoryFactory); it's currently set up to use the MMapDirectoryFactory, and I'm not using the near-real time aspect of Solr at this time.  All of the data imports do pass in the "commit" query parameter, so there should be a hard commit done after each one of the data imports.

I've attached the two threaddumps (I used jvisualvm from an RMI connection to the server, since the server doesn't have the JDK on it), as well as the 'du' output in text files.

If I can get you any more information about the configuration or the environment, please ask!

Thanks for your help.


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Thursday, September 05, 2013 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"


: I currently have Solr 4.3 set up with about 400 cores set to load upon
: start up.  When starting Solr with an empty index for each core, Solr is
: able to load all of the cores and start up normally as expected.  
: However, after running a dataimport on all cores and restarting Solr, it
: hangs at "org.apache.solr.core.CoreContainer; registering core: ..." 
: without any type of error message in the log.  The process still exists
: at this point, but doesn't make any progress even if left for a period
: of time.  Prior to the restart, Solr continues to function normally, and
: is searchable.

When solr gets into this state, can you generate a thread dump, wait 20-30 seconds, generate another thread dump, and then send both to the list so we can see what's going on at this point?

The easiest way to generate a threaddump is with jstack on the same machine...

	jstack <pid> >> threaddumps.log


: hang at the same spot.  It does appear to be related to files to an
: extent, since removing the index/"data" directory of half of the cores
: does allow Solr to start up normally.

wild shot in the dark -- is it possible you have really large transaction logs that are being replayed on startup, because you never did a hard commit after indexing?

can you also include in your next email a listing of all the files in all the data dirs of the affected solr instance, including file sizes?

something along the lines of this command output from your solr home dir...

	du -ab */data

?


-Hoss

RE: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Austin Rasmussen <AR...@directs.com>.
Thanks for clearing that up Erick.  The updateLog XML element isn't present in any of the solrconfig.xml files, so I don't believe this is enabled.  

I posted the directory listing of all of the core data directories in a prior post, but there are no files/folders found that contain "tlog" in the name of them.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, September 06, 2013 9:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

bq: I'm actually not using the transaction log (or the NRTCachingDirectoryFactory); it's currently set up to use the MMapDirectoryFactory,

This isn't relevant to whether you're using the update log or not, this is just how the index is handled. Look for something in your solrconfig.xml
like:
 <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

The other thing to check is if you have files in a "tlog" directory that's a sibling to your index directory as Hoss suggested.

You may well NOT have any transaction log, but it's something to check.


Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Erick Erickson <er...@gmail.com>.
bq: I'm actually not using the transaction log (or the
NRTCachingDirectoryFactory); it's currently set up to use the
MMapDirectoryFactory,

This isn't relevant to whether you're using the update log or not, this is
just how the index is handled. Look for something in your solrconfig.xml
like:
 <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

The other thing to check is if you have files in a "tlog" directory that's
a sibling to your index directory as Hoss suggested.

You may well NOT have any transaction log, but it's something to check.


RE: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Austin Rasmussen <AR...@directs.com>.
Thanks for the reply Hoss.

I'm actually not using the transaction log (or the NRTCachingDirectoryFactory); it's currently set up to use the MMapDirectoryFactory, and I'm not using the near-real time aspect of Solr at this time.  All of the data imports do pass in the "commit" query parameter, so there should be a hard commit done after each one of the data imports.

I've attached the two threaddumps (I used jvisualvm from an RMI connection to the server, since the server doesn't have the JDK on it), as well as the 'du' output in text files.

If I can get you any more information about the configuration or the environment, please ask!

Thanks for your help.


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Thursday, September 05, 2013 3:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"


: I currently have Solr 4.3 set up with about 400 cores set to load upon
: start up.  When starting Solr with an empty index for each core, Solr is
: able to load all of the cores and start up normally as expected.  
: However, after running a dataimport on all cores and restarting Solr, it
: hangs at "org.apache.solr.core.CoreContainer; registering core: ..." 
: without any type of error message in the log.  The process still exists
: at this point, but doesn't make any progress even if left for a period
: of time.  Prior to the restart, Solr continues to function normally, and
: is searchable.

When solr gets into this state, can you generate a thread dump, wait 20-30 seconds, generate another thread dump, and then send both to the list so we can see what's going on at this point?

The easiest way to generate a threaddump is with jstack on the same machine...

	jstack <pid> >> threaddumps.log


: hang at the same spot.  It does appear to be related to files to an
: extent, since removing the index/"data" directory of half of the cores
: does allow Solr to start up normally.

wild shot in the dark -- is it possible you have really large transaction logs that are being replayed on startup, because you never did a hard commit after indexing?

can you also include in your next email a listing of all the files in all the data dirs of the affected solr instance, including file sizes?

something along the lines of this command output from your solr home dir...

	du -ab */data

?


-Hoss

Re: Solr 4.3 Startup with Multiple Cores Hangs on "Registering Core"

Posted by Chris Hostetter <ho...@fucit.org>.
: I currently have Solr 4.3 set up with about 400 cores set to load upon 
: start up.  When starting Solr with an empty index for each core, Solr is 
: able to load all of the cores and start up normally as expected.  
: However, after running a dataimport on all cores and restarting Solr, it 
: hangs at "org.apache.solr.core.CoreContainer; registering core: ..." 
: without any type of error message in the log.  The process still exists 
: at this point, but doesn't make any progress even if left for a period 
: of time.  Prior to the restart, Solr continues to function normally, and 
: is searchable.

When solr gets into this state, can you generate a thread dump, wait 20-30 
seconds, generate another thread dump, and then send both to the list so 
we can see what's going on at this point?

The easiest way to generate a threaddump is with jstack on the same 
machine...

	jstack <pid> >> threaddumps.log


: hang at the same spot.  It does appear to be related to files to an 
: extent, since removing the index/"data" directory of half of the cores 
: does allow Solr to start up normally.

wild shot in the dark -- is it possible you have really large transaction 
logs that are being replayed on startup, because you never did a hard 
commit after indexing?

can you also include in your next email a listing of all the files in all 
the data dirs of the affected solr instance, including file sizes?

something along the lines of this command output from your solr home 
dir...

	du -ab */data

?


-Hoss