You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Hanjo Riege <ha...@cataloom.com> on 2004/01/19 16:35:25 UTC
Performance for search when I need all Hits
Hi all,
i have a question about the performance:
if i need all the results (about 2000 Hits) of a search and read them
from first to last then it needs about 3000 ms.
After a short look at the method Hits.getMoreDocs(int i) i decided to
read the last doc first. Now it needs only 1000 ms to read them all.
It seems that the Method is not optimized for getting all the results.
Is this a point for improvement or do i miss something?
regards
Hanjo
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Performance for search when I need all Hits
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 11, 2004, at 2:36 AM, Anand Stephen wrote:
> I am trying to convert the using Hits to HitCollector. Does anyone
> have any
> examples?
> This is what am trying to do, how do I get the score?
>
> <code>
> Searcher searcher = new IndexSearcher(indexDir);
> IndexReader indexReader = IndexReader.open(indexDir);
> final BitSet bits = new BitSet(indexReader.maxDoc());
> searcher.search(lquery, new HitCollector() {
> public void collect(int doc, float score) {
> bits.set(doc);
> }
> });
You should "collect" the score within your HitCollector too. The idea
of using a HitCollector is because you want access to all the documents
returned, so I suggest you set up a data structure that collects all
the documents and their score within the collect() method rather than
setting bits.
You could simply combine the code you have below into the collect()
method to extract the pieces you want there.
Erik
>
> int size = bits.length();
>
> String[] values = new String[size];
> String type = null;
> ArrayList results = new ArrayList(size);
>
> for (int i = 0,j = size; i < j; i++) {
> final Document document = searcher.doc(i);
> type = document.get("TYPE");
> //hits.doc(i).get("TYPE");
> values[i] = document.get("ID");
> // System.out.println("score = " + score);
> // how do i get the score???
> // resultsMap.put(hits.doc(i).get("ID"), new Float(score));
>
> }
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Performance for search when I need all Hits
Posted by Anand Stephen <an...@sonic.net>.
I am trying to convert the using Hits to HitCollector. Does anyone have any
examples?
This is what am trying to do, how do I get the score?
<code>
Searcher searcher = new IndexSearcher(indexDir);
IndexReader indexReader = IndexReader.open(indexDir);
final BitSet bits = new BitSet(indexReader.maxDoc());
searcher.search(lquery, new HitCollector() {
public void collect(int doc, float score) {
bits.set(doc);
}
});
int size = bits.length();
String[] values = new String[size];
String type = null;
ArrayList results = new ArrayList(size);
for (int i = 0,j = size; i < j; i++) {
final Document document = searcher.doc(i);
type = document.get("TYPE");
//hits.doc(i).get("TYPE");
values[i] = document.get("ID");
// System.out.println("score = " + score);
// how do i get the score???
// resultsMap.put(hits.doc(i).get("ID"), new Float(score));
}
</code>
thank you,
--a
----- Original Message -----
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Monday, January 19, 2004 9:21 AM
Subject: Re: Performance for search when I need all Hits
> If you need all search results, use a HitCollector rather than Hits.
> Look at the variants of the IndexSearcher.search method. I suspect you
> will see the speed with the HitCollector match what you see with your
> Hits tricks if not even faster.
>
> Erik
>
>
> On Jan 19, 2004, at 10:35 AM, Hanjo Riege wrote:
>
> > Hi all,
> >
> > i have a question about the performance:
> >
> > if i need all the results (about 2000 Hits) of a search and read them
> > from first to last then it needs about 3000 ms.
> >
> > After a short look at the method Hits.getMoreDocs(int i) i decided to
> > read the last doc first. Now it needs only 1000 ms to read them all.
> >
> > It seems that the Method is not optimized for getting all the results.
> >
> > Is this a point for improvement or do i miss something?
> >
> > regards
> >
> > Hanjo
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Performance for search when I need all Hits
Posted by Hanjo Riege <ha...@cataloom.com>.
Hi,
thanks for the hint. I should have read the javadoc.
sorry
Hanjo
Erik Hatcher schrieb:
> If you need all search results, use a HitCollector rather than Hits.
> Look at the variants of the IndexSearcher.search method. I suspect you
> will see the speed with the HitCollector match what you see with your
> Hits tricks if not even faster.
>
> Erik
>
>
> On Jan 19, 2004, at 10:35 AM, Hanjo Riege wrote:
>
>> Hi all,
>>
>> i have a question about the performance:
>>
>> if i need all the results (about 2000 Hits) of a search and read them
>> from first to last then it needs about 3000 ms.
>>
>> After a short look at the method Hits.getMoreDocs(int i) i decided to
>> read the last doc first. Now it needs only 1000 ms to read them all.
>>
>> It seems that the Method is not optimized for getting all the results.
>>
>> Is this a point for improvement or do i miss something?
>>
>> regards
>>
>> Hanjo
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Performance for search when I need all Hits
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
If you need all search results, use a HitCollector rather than Hits.
Look at the variants of the IndexSearcher.search method. I suspect you
will see the speed with the HitCollector match what you see with your
Hits tricks if not even faster.
Erik
On Jan 19, 2004, at 10:35 AM, Hanjo Riege wrote:
> Hi all,
>
> i have a question about the performance:
>
> if i need all the results (about 2000 Hits) of a search and read them
> from first to last then it needs about 3000 ms.
>
> After a short look at the method Hits.getMoreDocs(int i) i decided to
> read the last doc first. Now it needs only 1000 ms to read them all.
>
> It seems that the Method is not optimized for getting all the results.
>
> Is this a point for improvement or do i miss something?
>
> regards
>
> Hanjo
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
RE: Performance for search when I need all Hits
Posted by Robert Engels <re...@ix.netcom.com>.
I have thought that as well, but nobody responded. Also, it seems that the
hits size should dynamically increase to avoid multiple calls into
SearchTermDocs for the same term.
R
-----Original Message-----
From: Hanjo Riege [mailto:hanri@cataloom.com]
Sent: Monday, January 19, 2004 9:35 AM
To: lucene-dev@jakarta.apache.org
Subject: Performance for search when I need all Hits
Hi all,
i have a question about the performance:
if i need all the results (about 2000 Hits) of a search and read them
from first to last then it needs about 3000 ms.
After a short look at the method Hits.getMoreDocs(int i) i decided to
read the last doc first. Now it needs only 1000 ms to read them all.
It seems that the Method is not optimized for getting all the results.
Is this a point for improvement or do i miss something?
regards
Hanjo
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org