You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Winton Davies <wd...@overture.com> on 2001/11/29 06:09:18 UTC
Parallelising a query...
Hi,
Let say I want to retrieve all relevant listings for a query (just
suppose)...
I have 4 million documents... I could:
Split these into 4 x 1 million document indexes and then send a
query to 4 Lucene processes ? At the end I would have to sort the
results by relevance.
Question for Doug or any other Search Engine guru -- would this
reduce the time to find these results by 75% ?
I know it is probably a hard question to answer (i.e. all the
documents that match, might just be in one process...) but I'm more
getting at the average length of the inverted indexes that have to be
joined being reduced by 75%, hence the join should take only 25% of
the time...
Any thoughts on this idiocy ? Reason why I ask ? Well, lets say I
can't fit a 4 million document RamDir index into 1GB heap space, but
I could if I split it up :) ?
Cheers,
Winton
Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
GCJ and Lucene ?
Posted by Winton Davies <wd...@overture.com>.
Hi,
Another maybe quick question:
Has anyone tried using GCJ with Lucene ?
http://www.gnu.org/software/gcc/java/
As far as I tell, this tries to compile Java directly to native code.
I think it is restricted to 1.1 classes, which might be a gotcha
(does Lucene use any 1.2 classes ?)
Winton
Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Parallelising a query...
Posted by Winton Davies <wd...@overture.com>.
Hi again....
Another dumb question :) (actually I'm too busy to look at the code :) )
In the index, is the datastructure of termDocs (is that the right
term), sorted by anything? Or is it just insertion order ? I could
see how one might want to sort by the Doc with the highest term
frequency ? But I can also see why
it might not help.
e.g. Token1 -> doc1 (2) [occurences] -> doc2 (6) -> doc3 (3)
or is it like this ?
Token1 -> doc2 (6) -> doc3 (3) -> doc1 (2) ?
I have an idea for an optimization I want to make, but I'm not sure
exactly whether it is warrants investigation.
Winton
Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>