You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Jean-Francois Halleux <ha...@skynet.be> on 2004/01/07 21:17:29 UTC

Parallel search in MultiSearcher

Hello,

	as suggested, here is a patch that should parallelize searches among the
various searchables in a MultiSearcher. I only have a single HDD machine but
I guess that spreading index files over multiple HDD can greatly improve
search time when using this MultiSearcher.

This passes ANT test but be careful as it's been a while since I've
programmed and I don't know Lucene well.

KR,

Jean-Francois Halleux

----

Index: MultiSearcher.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/MultiSearch
er.java,v
retrieving revision 1.12
diff -u -r1.12 MultiSearcher.java
--- MultiSearcher.java	16 Sep 2003 20:06:32 -0000	1.12
+++ MultiSearcher.java	7 Jan 2004 20:14:57 -0000
@@ -141,29 +141,45 @@
     return maxDoc;
   }

-  public TopDocs search(Query query, Filter filter, int nDocs)
-      throws IOException {
-    HitQueue hq = new HitQueue(nDocs);
-    int totalHits = 0;
-
-    for (int i = 0; i < searchables.length; i++) { // search each searcher
-      TopDocs docs = searchables[i].search(query, filter, nDocs);
-      totalHits += docs.totalHits;		  // update totalHits
-      ScoreDoc[] scoreDocs = docs.scoreDocs;
-      for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into
hq
-	ScoreDoc scoreDoc = scoreDocs[j];
-        scoreDoc.doc += starts[i];                      // convert doc
-        if(!hq.insert(scoreDoc))
-            break;                                                // no
more scores > minScore
-      }
-    }
-
-    ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
-    for (int i = hq.size()-1; i >= 0; i--)	  // put docs in array
-      scoreDocs[i] = (ScoreDoc)hq.pop();
+  /**
+  * A search implementation which spans a new thread for each
+  * Searchable, waits for each search to complete and merge
+  * the results together
+  */
+ public TopDocs search(Query query, Filter filter, int nDocs)
+	 throws IOException {
+   HitQueue hq = new HitQueue(nDocs);
+   int totalHits = 0;
+   MultiSearcherThread[] msta=new MultiSearcherThread[searchables.length];
+
+   for (int i = 0; i < searchables.length; i++) { // search each searcher
+	 msta[i]=new MultiSearcherThread(searchables[i], query, filter,
+									 nDocs, hq, i, starts , "MultiSearcher thread #"+(i+1));
+	 msta[i].start();
+   }
+
+   for (int i=0; i < searchables.length; i++) {
+	   try {
+		   msta[i].join();
+	   }
+	   catch (InterruptedException ie) {
+		   ; // TODO: what should we do with this???
+	   }
+	   IOException ioe=msta[i].getIOException();
+	   if (ioe==null) {
+		   totalHits+=msta[i].hits();
+	   } else {
+		   // if one search produced an IOException, rethrow it
+		   throw ioe;
+	   }
+   }
+
+   ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
+   for (int i = hq.size()-1; i >= 0; i--)	  // put docs in array
+	 scoreDocs[i] = (ScoreDoc)hq.pop();

-    return new TopDocs(totalHits, scoreDocs);
-  }
+   return new TopDocs(totalHits, scoreDocs);
+ }


   /** Lower-level search API.
@@ -208,4 +224,64 @@
     return searchables[i].explain(query,doc-starts[i]); // dispatch to
searcher
   }

+}
+
+/**
+ * A thread subclass for searching a single searchable
+ */
+class MultiSearcherThread extends Thread {
+
+	private Searchable searchable;
+	private Query query;
+	private Filter filter;
+	private int nDocs;
+	private int hits;
+	private TopDocs docs;
+	private int i;
+	private HitQueue hq;
+	private int[] starts;
+	private IOException ioe;
+
+	public MultiSearcherThread(Searchable searchable, Query query, Filter
filter, int nDocs,
+								HitQueue hq, int i, int[] starts,
+								String name) {
+		super(name);
+		this.searchable=searchable;
+		this.query=query;
+		this.filter=filter;
+		this.nDocs=nDocs;
+		this.hq=hq;
+		this.i=i;
+		this.starts=starts;
+	}
+
+	public void run() {
+		try {
+			docs = searchable.search(query, filter, nDocs);
+		}
+		// Store the IOException for later use by the caller of this thread
+		catch (IOException ioe) {
+			this.ioe = ioe;
+		}
+		if (ioe==null) {
+			ScoreDoc[] scoreDocs = docs.scoreDocs;
+			for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into hq
+			   ScoreDoc scoreDoc = scoreDocs[j];
+			   scoreDoc.doc += starts[i]; // convert doc
+			   //it would be so nice if we had a thread-safe insert
+			   synchronized (MultiSearcherThread.class) {
+				   if(!hq.insert(scoreDoc)) break;
+			   }                              // no more scores > minScore
+			}
+		}
+	}
+
+	public int hits() {
+		return docs.totalHits;
+	}
+
+	public IOException getIOException() {
+		return ioe;
+	}
+
 }





-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: mercredi 7 janvier 2004 18:16
To: Lucene Developers List
Subject: Re: normalization BAD DESIGN ?


Robert Engels wrote:
> The design & implementation of the document/field normalization is very
> poor.

Thank you.

> It requires a byte[] with as (number of documents * number of fields)
> elements!

That's correct and has been mentioned many times on this list.  I've
never had anyone complain about the amount of memory this uses.

> With a document store of 100 million documents, with multiple fields, the
> memory required is staggering.

Is this a hypothetical case, or do you really have 100M documents that
you hope to efficiently search from a machine with less than a few
hundred megabytes of RAM?  100M is a very large collection, larger than
any web search engine until just a few years ago.  And today, a few
hundred megabytes is not really that much RAM, hardly staggering.  Most
laptops now ship with enough RAM to do this.

Collections with more than around 10M documents can become rather slow
to search (> 1 second on average).  So most folks who have 100M document
collections will implement a parallel distributed solution.  (This is,
e.g., what web search engines do.)  Multiple indexes are created, each
for a subset of the entire collection, and all are searched in parallel.
  Unfortunately such distributed systems (like, e.g., Nutch's) tend not
to be generic and none is included in Lucene.  It's almost possible to
build a parallel distributed search system by combining MultiSearcher
with RemoteSearchable, except that MultiSearcher searches serially, not
in parallel.  But it would not be hard to modify MultiSearcher to
implement parallel searching.  (Volunteers?)

> IndexReader has the following method definition,
>
> public abstract byte[] norms(String field) throws IOException;
>
> which is the source of the problem.
>
> Even returning null from this method does not help, as the PhraseScorer
and
> derived classes, maintain a reference, and do not perform a null check.
>
> I have modified 105 of PhraseScorer to be
>
> if(norms!=null)
>     score *= Similarity.decodeNorm(norms[first.doc]); // normalize

That's a reasonable approach.  If you don't want length normalization,
then it should be possible to disable it.

> Would it not be a better design, to define a method in IndexReader
>
> float getNorm(String fieldname,int docnum);
>
> so a implementation could cache this information in some fashion, or
always
> return 1.0 if it didn't care?

The problem with this approach is that the innermost search loops would
have to perform a hash table lookup on the field name, which would
significantly impact performance.  So the first approach, permitting
IndexReader.norms() to return null, is preferable.  If that winds up
being useful to you, please submit a patch.

Cheers,

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Parallel search in MultiSearcher

Posted by Doug Cutting <cu...@apache.org>.

Your patch was garbled a bit in transmission and I cannot apply it.  Can 
you please file a bug report and attach the patch to that?  Thanks.

Doug

Jean-Francois Halleux wrote:
> Hello,
> 
> The following patch includes the two mentioned changes plus the unit test
> class
> and a light modification to testMultiSearch to allow subclassing it.
> 
> I would be interested to know if somebody tried this on a multiple HDD
> machine or in a distributed index architecture if performance improvement is
> noticeable.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Parallel search in MultiSearcher

Posted by Jean-Francois Halleux <ha...@skynet.be>.

Hello,

The following patch includes the two mentioned changes plus the unit test
class
and a light modification to testMultiSearch to allow subclassing it.

I would be interested to know if somebody tried this on a multiple HDD
machine or in a distributed index architecture if performance improvement is
noticeable.


KR,

Jeff Halleux

----

Index: java/org/apache/lucene/search/ParallelMultiSearcher.java
===================================================================
RCS file: java/org/apache/lucene/search/ParallelMultiSearcher.java
diff -N java/org/apache/lucene/search/ParallelMultiSearcher.java
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ java/org/apache/lucene/search/ParallelMultiSearcher.java	8 Jan 2004
18:00:00 -0000
@@ -0,0 +1,247 @@
+package org.apache.lucene.search;
+
+/* ====================================================================
+ * The Apache Software License, Version 1.1
+ *
+ * Copyright (c) 2001 The Apache Software Foundation.  All rights
+ * reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * 3. The end-user documentation included with the redistribution,
+ *    if any, must include the following acknowledgment:
+ *       "This product includes software developed by the
+ *        Apache Software Foundation (http://www.apache.org/)."
+ *    Alternately, this acknowledgment may appear in the software itself,
+ *    if and wherever such third-party acknowledgments normally appear.
+ *
+ * 4. The names "Apache" and "Apache Software Foundation" and
+ *    "Apache Lucene" must not be used to endorse or promote products
+ *    derived from this software without prior written permission. For
+ *    written permission, please contact apache@apache.org.
+ *
+ * 5. Products derived from this software may not be called "Apache",
+ *    "Apache Lucene", nor may "Apache" appear in their name, without
+ *    prior written permission of the Apache Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
+ * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ * ====================================================================
+ *
+ * This software consists of voluntary contributions made by many
+ * individuals on behalf of the Apache Software Foundation.  For more
+ * information on the Apache Software Foundation, please see
+ * <http://www.apache.org/>.
+ */
+
+import java.io.IOException;
+
+import org.apache.lucene.index.Term;
+
+/** Implements parallel search over a set of <code>Searchables</code>.
+ *
+ * <p>Applications usually need only call the inherited {@link
#search(Query)}
+ * or {@link #search(Query,Filter)} methods.
+ */
+public class ParallelMultiSearcher extends MultiSearcher {
+	private Searchable[] searchables;
+	private int[] starts;
+	private int maxDoc = 0;
+
+	/** Creates a searcher which searches <i>searchables</i>. */
+	public ParallelMultiSearcher(Searchable[] searchables) throws IOException
{
+		super(searchables);
+	}
+
+	/**
+	 * TODO: parallelize this one too
+	 */
+	public int docFreq(Term term) throws IOException {
+		int docFreq = 0;
+		for (int i = 0; i < searchables.length; i++)
+			docFreq += searchables[i].docFreq(term);
+		return docFreq;
+	}
+
+	/**
+	* A search implementation which spans a new thread for each
+	* Searchable, waits for each search to complete and merge
+	* the results back together.
+	*/
+	public TopDocs search(Query query, Filter filter, int nDocs)
+		throws IOException {
+		HitQueue hq = new HitQueue(nDocs);
+		int totalHits = 0;
+		MultiSearcherThread[] msta =
+			new MultiSearcherThread[searchables.length];
+
+		for (int i = 0; i < searchables.length; i++) { // search each searcher
+			// Assume not too many searchables and cost of creating a thread is by
far inferior to a search
+			msta[i] =
+				new MultiSearcherThread(
+					searchables[i],
+					query,
+					filter,
+					nDocs,
+					hq,
+					i,
+					starts,
+					"MultiSearcher thread #" + (i + 1));
+			msta[i].start();
+		}
+
+		for (int i = 0; i < searchables.length; i++) {
+			try {
+				msta[i].join();
+			} catch (InterruptedException ie) {
+				; // TODO: what should we do with this???
+			}
+			IOException ioe = msta[i].getIOException();
+			if (ioe == null) {
+				totalHits += msta[i].hits();
+			} else {
+				// if one search produced an IOException, rethrow it
+				throw ioe;
+			}
+		}
+
+		ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
+		for (int i = hq.size() - 1; i >= 0; i--) // put docs in array
+			scoreDocs[i] = (ScoreDoc) hq.pop();
+
+		return new TopDocs(totalHits, scoreDocs);
+	}
+
+	/** Lower-level search API.
+	 *
+	 * <p>{@link HitCollector#collect(int,float)} is called for every non-zero
+	 * scoring document.
+	 *
+	 * <p>Applications should only use this if they need <i>all</i> of the
+	 * matching documents.  The high-level search API ({@link
+	 * Searcher#search(Query)}) is usually more efficient, as it skips
+	 * non-high-scoring hits.
+	 *
+	 * @param query to match documents
+	 * @param filter if non-null, a bitset used to eliminate some documents
+	 * @param results to receive hits
+	 *
+	 * TODO: parallelize this one too
+	 */
+	public void search(Query query, Filter filter, final HitCollector results)
+		throws IOException {
+		for (int i = 0; i < searchables.length; i++) {
+
+			final int start = starts[i];
+
+			searchables[i].search(query, filter, new HitCollector() {
+				public void collect(int doc, float score) {
+					results.collect(doc + start, score);
+				}
+			});
+
+		}
+	}
+
+	/*
+	 * TODO: this one could be parallelized too
+	 * @see
org.apache.lucene.search.Searchable#rewrite(org.apache.lucene.search.Query)
+	 */
+	public Query rewrite(Query original) throws IOException {
+		Query[] queries = new Query[searchables.length];
+		for (int i = 0; i < searchables.length; i++) {
+			queries[i] = searchables[i].rewrite(original);
+		}
+		return original.combine(queries);
+	}
+
+}
+
+/**
+ * A thread subclass for searching a single searchable
+ */
+class MultiSearcherThread extends Thread {
+
+	private Searchable searchable;
+	private Query query;
+	private Filter filter;
+	private int nDocs;
+	private int hits;
+	private TopDocs docs;
+	private int i;
+	private HitQueue hq;
+	private int[] starts;
+	private IOException ioe;
+
+	public MultiSearcherThread(
+		Searchable searchable,
+		Query query,
+		Filter filter,
+		int nDocs,
+		HitQueue hq,
+		int i,
+		int[] starts,
+		String name) {
+		super(name);
+		this.searchable = searchable;
+		this.query = query;
+		this.filter = filter;
+		this.nDocs = nDocs;
+		this.hq = hq;
+		this.i = i;
+		this.starts = starts;
+	}
+
+	public void run() {
+		try {
+			docs = searchable.search(query, filter, nDocs);
+		}
+		// Store the IOException for later use by the caller of this thread
+		catch (IOException ioe) {
+			this.ioe = ioe;
+		}
+		if (ioe == null) {
+			ScoreDoc[] scoreDocs = docs.scoreDocs;
+			for (int j = 0;
+				j < scoreDocs.length;
+				j++) { // merge scoreDocs into hq
+				ScoreDoc scoreDoc = scoreDocs[j];
+				scoreDoc.doc += starts[i]; // convert doc
+				//it would be so nice if we had a thread-safe insert
+				synchronized (hq) {
+					if (!hq.insert(scoreDoc))
+						break;
+				} // no more scores > minScore
+			}
+		}
+	}
+
+	public int hits() {
+		return docs.totalHits;
+	}
+
+	public IOException getIOException() {
+		return ioe;
+	}
+
+}
Index: test/org/apache/lucene/search/TestMultiSearcher.java
===================================================================
RCS file:
/home/cvspublic/jakarta-lucene/src/test/org/apache/lucene/search/TestMultiSe
archer.java,v
retrieving revision 1.4
diff -u -r1.4 TestMultiSearcher.java
--- test/org/apache/lucene/search/TestMultiSearcher.java	26 Nov 2002
17:31:43 -0000	1.4
+++ test/org/apache/lucene/search/TestMultiSearcher.java	8 Jan 2004
18:00:02 -0000
@@ -81,6 +81,14 @@
         super(name);
     }

+	/**
+	 * Return a new instance of the concrete MultiSearcher class
+	 * used in this test
+	 */
+	protected MultiSearcher getMultiSearcherInstance(Searcher[] searchers)
throws IOException {
+		return new MultiSearcher(searchers);
+	}
+
     public void testEmptyIndex()
         throws Exception
     {
@@ -134,7 +142,7 @@
         searchers[0] = new IndexSearcher(indexStoreB);
         searchers[1] = new IndexSearcher(indexStoreA);
         // creating the multiSearcher
-        Searcher mSearcher = new MultiSearcher(searchers);
+        Searcher mSearcher = getMultiSearcherInstance(searchers);
         // performing the search
         Hits hits = mSearcher.search(query);

@@ -171,7 +179,7 @@
         searchers2[0] = new IndexSearcher(indexStoreB);
         searchers2[1] = new IndexSearcher(indexStoreA);
         // creating the mulitSearcher
-        Searcher mSearcher2 = new MultiSearcher(searchers2);
+        Searcher mSearcher2 = getMultiSearcherInstance(searchers2);
         // performing the same search
         Hits hits2 = mSearcher2.search(query);

@@ -213,7 +221,7 @@
         searchers3[0] = new IndexSearcher(indexStoreB);
         searchers3[1] = new IndexSearcher(indexStoreA);
         // creating the mulitSearcher
-        Searcher mSearcher3 = new MultiSearcher(searchers3);
+        Searcher mSearcher3 = getMultiSearcherInstance(searchers3);
         // performing the same search
         Hits hits3 = mSearcher3.search(query);

Index: test/org/apache/lucene/search/TestParallelMultiSearcher.java
===================================================================
RCS file: test/org/apache/lucene/search/TestParallelMultiSearcher.java
diff -N test/org/apache/lucene/search/TestParallelMultiSearcher.java
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ test/org/apache/lucene/search/TestParallelMultiSearcher.java	8 Jan 2004
18:00:02 -0000
@@ -0,0 +1,73 @@
+package org.apache.lucene.search;
+
+/* ====================================================================
+ * The Apache Software License, Version 1.1
+ *
+ * Copyright (c) 2001 The Apache Software Foundation.  All rights
+ * reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * 3. The end-user documentation included with the redistribution,
+ *    if any, must include the following acknowledgment:
+ *       "This product includes software developed by the
+ *        Apache Software Foundation (http://www.apache.org/)."
+ *    Alternately, this acknowledgment may appear in the software itself,
+ *    if and wherever such third-party acknowledgments normally appear.
+ *
+ * 4. The names "Apache" and "Apache Software Foundation" and
+ *    "Apache Lucene" must not be used to endorse or promote products
+ *    derived from this software without prior written permission. For
+ *    written permission, please contact apache@apache.org.
+ *
+ * 5. Products derived from this software may not be called "Apache",
+ *    "Apache Lucene", nor may "Apache" appear in their name, without
+ *    prior written permission of the Apache Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
+ * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ * ====================================================================
+ *
+ * This software consists of voluntary contributions made by many
+ * individuals on behalf of the Apache Software Foundation.  For more
+ * information on the Apache Software Foundation, please see
+ * <http://www.apache.org/>.
+ */
+
+import java.io.IOException;
+
+/**
+ * Unit tests for the ParallelMultiSearcher
+ */
+public class TestParallelMultiSearcher extends TestMultiSearcher {
+
+	public TestParallelMultiSearcher(String name) {
+		super(name);
+	}
+
+	protected MultiSearcher getMultiSearcherInstance(Searcher[] searchers)
+		throws IOException {
+		return new ParallelMultiSearcher(searchers);
+	}
+
+}


-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: mercredi 7 janvier 2004 23:49
To: Lucene Developers List
Subject: Re: Parallel search in MultiSearcher


Jean-Francois Halleux wrote:
> 	as suggested, here is a patch that should parallelize searches among the
> various searchables in a MultiSearcher. I only have a single HDD machine
but
> I guess that spreading index files over multiple HDD can greatly improve
> search time when using this MultiSearcher.
>
> This passes ANT test but be careful as it's been a while since I've
> programmed and I don't know Lucene well.

That was quick!  I have two suggestions.

First, I think it might be best if this is a subclass of MultiSearcher,
so that the old sequential version is still available.

Second, I think the following synchronization goes too far:

> +			   synchronized (MultiSearcherThread.class) {
> +				   if(!hq.insert(scoreDoc)) break;
> +			   }                              // no more scores > minScore

Wouldn't it be sufficient to synchronize on 'hq', e.g.:

   synchronized (hq) {
     if(!hq.insert(scoreDoc)) break;
   }

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Parallel search in MultiSearcher

Posted by Doug Cutting <cu...@apache.org>.

Jean-Francois Halleux wrote:
> 	as suggested, here is a patch that should parallelize searches among the
> various searchables in a MultiSearcher. I only have a single HDD machine but
> I guess that spreading index files over multiple HDD can greatly improve
> search time when using this MultiSearcher.
> 
> This passes ANT test but be careful as it's been a while since I've
> programmed and I don't know Lucene well.

That was quick!  I have two suggestions.

First, I think it might be best if this is a subclass of MultiSearcher, 
so that the old sequential version is still available.

Second, I think the following synchronization goes too far:

> +			   synchronized (MultiSearcherThread.class) {
> +				   if(!hq.insert(scoreDoc)) break;
> +			   }                              // no more scores > minScore

Wouldn't it be sufficient to synchronize on 'hq', e.g.:

   synchronized (hq) {
     if(!hq.insert(scoreDoc)) break;
   }

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org