You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "drazen.nis" <d....@youngculture.com> on 2011/08/16 10:00:30 UTC

[SPATIAL] Spatial search runs forever

Hello,

Recently we have introduced distance searching/sorting into the existing
Lucene index, using the Spatial contrib for Lucene 2.9.4. There are 100K+
documents into the index where only 20K docs had latitude/longitude and
_tier_* fields. Spatial queries ran quite OK. 

After enriching the index with geo coordinates for most of the documents,
all queries using spatial distance filter + sorting started to run forever.
The details about the implementation are below. 
Do you have any idea what could cause this problem?


Environment Details
------------------
Lucene 2.9

Java 1.6.0_14
JAVA_OPTS=-Xms8000M -Xmx8000M -server -XX:-UseParallelOldGC
-XX:+PrintCommandLineFlags -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails -XX:+DisableExplicitGC -Xloggc:gc.log

CentOS release 5.5 (Final)
8 cores server (physical machine)
18GB RAM
RAID5 HDD
(on this machine only Apache Web Server is running at the moment) 


Implementation Details
------------------
Implementation is based on the blog
http://develop.nydi.ch/2010/10/lucene-spatial-example/. During the execution
of spatial query the processor usage is raised to the max and runs like that
for hours. Thread dump shows next:

"searchers-thread-63" prio=10 tid=0x00000000488e4800 nid=0x3dab runnable
[0x0000000046789000]
   java.lang.Thread.State: RUNNABLE
	at java.util.HashMap.put(HashMap.java:374)
	at
org.apache.lucene.spatial.tier.LatLongDistanceFilter$1.match(LatLongDistanceFilter.java:97)
	at
org.apache.lucene.search.FilteredDocIdSet$1.match(FilteredDocIdSet.java:73)
	at
org.apache.lucene.search.FilteredDocIdSetIterator.advance(FilteredDocIdSetIterator.java:87)
	at org.apache.lucene.util.OpenBitSetDISI.inPlaceAnd(OpenBitSetDISI.java:66)
	at org.apache.lucene.misc.ChainedFilter.doChain(ChainedFilter.java:253)
	at org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:177)
	at org.apache.lucene.misc.ChainedFilter.getDocIdSet(ChainedFilter.java:104)
	at
org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:277)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:258)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
	at org.apache.lucene.search.Searcher.search(Searcher.java:90)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.executeSearch(NewLuceneConnector.java:730)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.access$000(NewLuceneConnector.java:33)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector$2.run(NewLuceneConnector.java:884)
	at
javolution.context.ConcurrentContext$Default.executeAction(ConcurrentContext.java:358)
	at javolution.context.ConcurrentContext.execute(ConcurrentContext.java:271)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.newSearchByGroupsImpl(NewLuceneConnector.java:879)
	at
com.yc.cyclone.connector.lucene.NewLuceneConnector.newSearchByGroupsImpl(NewLuceneConnector.java:782)
	at
com.yc.cyclone.isystem.search.grouping.TopicGroupingSearch$1.call(TopicGroupingSearch.java:667)
	at
com.yc.cyclone.isystem.search.grouping.TopicGroupingSearch$1.call(TopicGroupingSearch.java:662)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
	at
com.yc.cyclone.services.concurrency.WorkerThread.run(WorkerThread.java:49)

It's interesting, though, that even the processor was 100% used all the
time, other (non-spatial) searches and indexing tasks were processed by
Lucene without any problem and without noticable performance decrease.

We execute multiple queries in parallel (one search parameter differs in
those queries), which reuse the same filter, in this case this is:
new ChainedFilter( new Filter[] {nonSpatialQueryFilter,
distanceQueryBuilder.getFilter()}, ChainedFilter.AND);

For sorting is used:
new
DistanceFieldComparatorSource(distanceQueryBuilder..getDistanceFilter());


Here is one entry from the index (spatial fields):
_tier_10    _tier_11    _tier_12    _tier_13    _tier_14    _tier_15   
_tier_7    _tier_8    _tier_9   lat          lng
0.0          1.0001      2.0003      4.0006      9.00013     18.00027  0.0       
0.0          0.0       47.61242  8.54002   

Note that those fields are indexed as numeric fields, I've used
NumericUtils.prefixCodedToDouble(field.stringValue()) to print those data.

There are also documents which do not have those fields indexed.


Thank you.

Best Regards,
Drazen

--
View this message in context: http://lucene.472066.n3.nabble.com/SPATIAL-Spatial-search-runs-forever-tp3258018p3258018.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: [SPATIAL] Spatial search runs forever

Posted by "drazen.nis" <d....@youngculture.com>.

At the end I've found what the problem is. The problem is in using non thread
safe Map implementations in DistanceFilter. So if you execute the searches,
using the same instance of DistanceFilter, using one thread, everything
works as expected. But executing it with multiple threads in parallel, the
HashMap and WeakHashMap fields usage (DistanceFilter.destances and
DistanceFilter.distanceLookupCache) could go into indefinite loops (e.g. see
http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html and
http://old.nabble.com/DO-NOT-REPLY--Bug-50078--New%3A-Concurrent-access-to-WeakHashMap-in-ConcurrentCache-causes-infinite-loop,-100--CPU-usage-p29940263.html)

So, after changing those Maps' implementations to HashMap ->
ConcurrentHashMap and new WeakHashMap() -> Collections.synchronizedMap(new
WeakHashMap()); the problem disappeared. I did not notice some performance
penalties, actually the opposite to that, since I can now run spatial search
using multiple threads. But I did not run some reliable tests to prove that,
this is just my subjective feeling executing the searches.

Nevertheless this issue with concurrent usage, I still think spatial contrib
and idea behind it are great and helped me a lot.
Maybe some JIRA issue for this use case should be created.

--
View this message in context: http://lucene.472066.n3.nabble.com/SPATIAL-Spatial-search-runs-forever-tp3258018p3261103.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org