You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Hanjo Riege <ha...@cataloom.com> on 2004/01/19 16:35:25 UTC

Performance for search when I need all Hits

Hi all,

i have a question about the performance:

if i need all the results (about 2000 Hits) of a search and read them 
from first to last then it needs about 3000 ms.

After a short look at the method Hits.getMoreDocs(int i) i decided to 
read the last doc first. Now it needs only 1000 ms to read them all.

It seems that the Method is not optimized for getting all the results.

Is this a point for improvement or do i miss something?

regards

Hanjo


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Performance for search when I need all Hits

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 11, 2004, at 2:36 AM, Anand Stephen wrote:
> I am trying to convert the using Hits to HitCollector. Does anyone 
> have any
> examples?
> This is what am trying to do, how do I get the score?
>
> <code>
>         Searcher searcher = new IndexSearcher(indexDir);
>         IndexReader indexReader = IndexReader.open(indexDir);
>         final BitSet bits = new BitSet(indexReader.maxDoc());
>         searcher.search(lquery, new HitCollector() {
>             public void collect(int doc, float score) {
>                 bits.set(doc);
>             }
>         });

You should "collect" the score within your HitCollector too.  The idea 
of using a HitCollector is because you want access to all the documents 
returned, so I suggest you set up a data structure that collects all 
the documents and their score within the collect() method rather than 
setting bits.

You could simply combine the code you have below into the collect() 
method to extract the pieces you want there.

	Erik


>
>         int size = bits.length();
>
>         String[] values = new String[size];
>         String type = null;
>         ArrayList results = new ArrayList(size);
>
>         for (int i = 0,j = size; i < j; i++) {
>             final Document document = searcher.doc(i);
>             type = document.get("TYPE");
>             //hits.doc(i).get("TYPE");
>             values[i] = document.get("ID");
> //            System.out.println("score = " + score);
> // how do i get the score???
> //            resultsMap.put(hits.doc(i).get("ID"), new Float(score));
>
>         }


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Performance for search when I need all Hits

Posted by Anand Stephen <an...@sonic.net>.
I am trying to convert the using Hits to HitCollector. Does anyone have any
examples?
This is what am trying to do, how do I get the score?

<code>
        Searcher searcher = new IndexSearcher(indexDir);
        IndexReader indexReader = IndexReader.open(indexDir);
        final BitSet bits = new BitSet(indexReader.maxDoc());
        searcher.search(lquery, new HitCollector() {
            public void collect(int doc, float score) {
                bits.set(doc);
            }
        });

        int size = bits.length();

        String[] values = new String[size];
        String type = null;
        ArrayList results = new ArrayList(size);

        for (int i = 0,j = size; i < j; i++) {
            final Document document = searcher.doc(i);
            type = document.get("TYPE");
            //hits.doc(i).get("TYPE");
            values[i] = document.get("ID");
//            System.out.println("score = " + score);
// how do i get the score???
//            resultsMap.put(hits.doc(i).get("ID"), new Float(score));

        }
</code>

thank you,
--a

----- Original Message ----- 
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Monday, January 19, 2004 9:21 AM
Subject: Re: Performance for search when I need all Hits


> If you need all search results, use a HitCollector rather than Hits.
> Look at the variants of the IndexSearcher.search method.  I suspect you
> will see the speed with the HitCollector match what you see with your
> Hits tricks if not even faster.
>
> Erik
>
>
> On Jan 19, 2004, at 10:35 AM, Hanjo Riege wrote:
>
> > Hi all,
> >
> > i have a question about the performance:
> >
> > if i need all the results (about 2000 Hits) of a search and read them
> > from first to last then it needs about 3000 ms.
> >
> > After a short look at the method Hits.getMoreDocs(int i) i decided to
> > read the last doc first. Now it needs only 1000 ms to read them all.
> >
> > It seems that the Method is not optimized for getting all the results.
> >
> > Is this a point for improvement or do i miss something?
> >
> > regards
> >
> > Hanjo
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Performance for search when I need all Hits

Posted by Hanjo Riege <ha...@cataloom.com>.
Hi,

thanks for the hint. I should have read the javadoc.

sorry

Hanjo

Erik Hatcher schrieb:
> If you need all search results, use a HitCollector rather than Hits.  
> Look at the variants of the IndexSearcher.search method.  I suspect you 
> will see the speed with the HitCollector match what you see with your 
> Hits tricks if not even faster.
> 
>     Erik
> 
> 
> On Jan 19, 2004, at 10:35 AM, Hanjo Riege wrote:
> 
>> Hi all,
>>
>> i have a question about the performance:
>>
>> if i need all the results (about 2000 Hits) of a search and read them 
>> from first to last then it needs about 3000 ms.
>>
>> After a short look at the method Hits.getMoreDocs(int i) i decided to 
>> read the last doc first. Now it needs only 1000 ms to read them all.
>>
>> It seems that the Method is not optimized for getting all the results.
>>
>> Is this a point for improvement or do i miss something?
>>
>> regards
>>
>> Hanjo
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Performance for search when I need all Hits

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
If you need all search results, use a HitCollector rather than Hits.  
Look at the variants of the IndexSearcher.search method.  I suspect you 
will see the speed with the HitCollector match what you see with your 
Hits tricks if not even faster.

	Erik


On Jan 19, 2004, at 10:35 AM, Hanjo Riege wrote:

> Hi all,
>
> i have a question about the performance:
>
> if i need all the results (about 2000 Hits) of a search and read them 
> from first to last then it needs about 3000 ms.
>
> After a short look at the method Hits.getMoreDocs(int i) i decided to 
> read the last doc first. Now it needs only 1000 ms to read them all.
>
> It seems that the Method is not optimized for getting all the results.
>
> Is this a point for improvement or do i miss something?
>
> regards
>
> Hanjo
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


RE: Performance for search when I need all Hits

Posted by Robert Engels <re...@ix.netcom.com>.
I have thought that as well, but nobody responded. Also, it seems that the
hits size should dynamically increase to avoid multiple calls into
SearchTermDocs for the same term.

R

-----Original Message-----
From: Hanjo Riege [mailto:hanri@cataloom.com]
Sent: Monday, January 19, 2004 9:35 AM
To: lucene-dev@jakarta.apache.org
Subject: Performance for search when I need all Hits


Hi all,

i have a question about the performance:

if i need all the results (about 2000 Hits) of a search and read them
from first to last then it needs about 3000 ms.

After a short look at the method Hits.getMoreDocs(int i) i decided to
read the last doc first. Now it needs only 1000 ms to read them all.

It seems that the Method is not optimized for getting all the results.

Is this a point for improvement or do i miss something?

regards

Hanjo


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org