You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sameer Shisodia <ge...@gmail.com> on 2006/04/11 06:12:04 UTC

hit.doc, hit.score and FSDir performance

Hi All.

I am using Lucene as the backbone of a 'Smart Search'.

I have a layer over search that extensively analyzes results at runtime to
bucket them. I do trim the resultset, but only after this procesing since
their are non document weights that are combined with the result scores, and
the hits are then reordered/modified.

This needs to essentially get all docs (cause there's some field level
analysis), and the score for each upfront, and it seems to be taking forever
to do for a large no of hits. The documents themselves are tiny - less than
half a k usually.  Hit.doc() and .score() seem to be where its taking time -
quite as cautioned in the javadocs.

Another peculiarity : the query is basically "keywords" which hits all
fields, and you can additionally make it more precise by certain fields as
field:value. For the latter case, for a similar number of hits, the same
iteration above is much quicker than in the case where a similar number of
hits is found by the keywords hitting all fields. The query is NOT visibly
slower - but the iteration is. Something to do with how spread out across
the index the hits are ?

Is there a possible workaround for the .doc()/.score() access ? Can
RAMDirectory be used only for searches over a "regular" FSDirectory index -
and is it usable when the index size is a multiple of available RAM (this is
on RH9 or fedora core) ?

Thanks in advance,
Sameer

--
Sameer Shisodia  Bangalore
get.sameer@gmail.com