You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Brisbart Franck <Fr...@kelkoo.net> on 2003/02/04 09:12:33 UTC

how to get an extra count

Hi all,

I'm trying to gather information about my non-searched (ie not used for 
the search) fields.
Let's take an index with 2 fields: 'artist' (for the artist name) an 
'type' (for his type of music).
I need to perform a search on the 'artist' field to retrieve a list of 
artists matching the query. But I also need to have the number of 
artists per 'type'.
My first solution was to write a HitCollector which do the work:
     public void collect(int doc, float score) {
     Document document = indexReader_.document(doc);
     if(document.get("type").equals("Rock"))
         nbRock++;
     ...
     }
But, as I first get the document to analyze my non-searched field, the 
treatment can be very long for searches with lots of results.

Is there any better(=faster) method to have this extra info ??

Thanks a lot for your help.
Franck


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: how to get an extra count

Posted by Brisbart Franck <Fr...@kelkoo.net>.

Ype Kingma wrote:
> Franck,
> 
> On Monday 07 April 2003 05:59, Brisbart Franck wrote:
> 
>>Ype Kingma wrote:
>>
>>>On Tuesday 04 February 2003 09:12, you wrote:
>>
> 
> <flashing back>
> 
>>>>Hi all,
>>>>
>>>>I'm trying to gather information about my non-searched (ie not used for
>>>>the search) fields.
>>>>Let's take an index with 2 fields: 'artist' (for the artist name) an
>>>>'type' (for his type of music).
>>>>I need to perform a search on the 'artist' field to retrieve a list of
>>>>artists matching the query. But I also need to have the number of
>>>>artists per 'type'.
>>>>My first solution was to write a HitCollector which do the work:
>>>>    public void collect(int doc, float score) {
>>>>    Document document = indexReader_.document(doc);
>>>>    if(document.get("type").equals("Rock"))
>>>>        nbRock++;
>>>>    ...
>>>>    }
>>>>But, as I first get the document to analyze my non-searched field, the
>>>>treatment can be very long for searches with lots of results.
>>>>
>>>>Is there any better(=faster) method to have this extra info ??
>>>
>>>It's best not to retrieve document fields during search.
>>>You can do this after collecting, possibly in the order
>>>that you collected the documents.
>>
>>You're right. Using document fields during a search wasn't a good idea.
>>In fact, I would like to avoid using document fields at all. Because, it
>>takes too much time. If a search returns 10000 results, I need to
>>analyse all of them to get my extra info (nb of artist per type). And
>>even if I have only 10 results to display, my search is very slow.
>>It'll be great to treat my 'type' field the same way as the 'freq' is
>>used. I think this will be much faster.
>>How can I store the content of this field in a lucene file (such as
>><segment>.frq) to use it during my search ??
> 
> 
> Couldn't you extend your query with
> 
>  +type:rock
> 
> and use the high level search API and ask for the size of the results?
> You can do that when you index the 'type' field.
> 
If I do that, I should run 1 query for each 'type'. And, as I have 
something like 2000 types, I don't think it is better to have 2000 fast 
requests than 1 big request.
I could do 1 query for each type found by the first query. But it only 
shifts the problem, because I should first retrieve all these types, and 
to do that I have to analyse all my results.
My real problem is to find the best way to use extra data (not search 
fields) to compute some meta data on a given query.

Franck


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: how to get an extra count

Posted by Ype Kingma <yk...@xs4all.nl>.

Franck,

On Monday 07 April 2003 05:59, Brisbart Franck wrote:
> Ype Kingma wrote:
> > On Tuesday 04 February 2003 09:12, you wrote:

<flashing back>

> >>Hi all,
> >>
> >>I'm trying to gather information about my non-searched (ie not used for
> >>the search) fields.
> >>Let's take an index with 2 fields: 'artist' (for the artist name) an
> >>'type' (for his type of music).
> >>I need to perform a search on the 'artist' field to retrieve a list of
> >>artists matching the query. But I also need to have the number of
> >>artists per 'type'.
> >>My first solution was to write a HitCollector which do the work:
> >>     public void collect(int doc, float score) {
> >>     Document document = indexReader_.document(doc);
> >>     if(document.get("type").equals("Rock"))
> >>         nbRock++;
> >>     ...
> >>     }
> >>But, as I first get the document to analyze my non-searched field, the
> >>treatment can be very long for searches with lots of results.
> >>
> >>Is there any better(=faster) method to have this extra info ??
> >
> > It's best not to retrieve document fields during search.
> > You can do this after collecting, possibly in the order
> > that you collected the documents.
>
> You're right. Using document fields during a search wasn't a good idea.
> In fact, I would like to avoid using document fields at all. Because, it
> takes too much time. If a search returns 10000 results, I need to
> analyse all of them to get my extra info (nb of artist per type). And
> even if I have only 10 results to display, my search is very slow.
> It'll be great to treat my 'type' field the same way as the 'freq' is
> used. I think this will be much faster.
> How can I store the content of this field in a lucene file (such as
> <segment>.frq) to use it during my search ??

Couldn't you extend your query with

 +type:rock

and use the high level search API and ask for the size of the results?
You can do that when you index the 'type' field.

Kind regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: how to get an extra count

Posted by Brisbart Franck <Fr...@kelkoo.net>.

Ype Kingma wrote:
> On Tuesday 04 February 2003 09:12, you wrote:
> 
>>Hi all,
>>
>>I'm trying to gather information about my non-searched (ie not used for
>>the search) fields.
>>Let's take an index with 2 fields: 'artist' (for the artist name) an
>>'type' (for his type of music).
>>I need to perform a search on the 'artist' field to retrieve a list of
>>artists matching the query. But I also need to have the number of
>>artists per 'type'.
>>My first solution was to write a HitCollector which do the work:
>>     public void collect(int doc, float score) {
>>     Document document = indexReader_.document(doc);
>>     if(document.get("type").equals("Rock"))
>>         nbRock++;
>>     ...
>>     }
>>But, as I first get the document to analyze my non-searched field, the
>>treatment can be very long for searches with lots of results.
>>
>>Is there any better(=faster) method to have this extra info ??
> 
> 
> It's best not to retrieve document fields during search.
> You can do this after collecting, possibly in the order
> that you collected the documents.
> 
You're right. Using document fields during a search wasn't a good idea.
In fact, I would like to avoid using document fields at all. Because, it 
takes too much time. If a search returns 10000 results, I need to 
analyse all of them to get my extra info (nb of artist per type). And 
even if I have only 10 results to display, my search is very slow.
It'll be great to treat my 'type' field the same way as the 'freq' is 
used. I think this will be much faster.
How can I store the content of this field in a lucene file (such as 
<segment>.frq) to use it during my search ??

thx,
Franck



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: how to get an extra count

Posted by Ype Kingma <yk...@xs4all.nl>.

On Tuesday 04 February 2003 09:12, you wrote:
> Hi all,
>
> I'm trying to gather information about my non-searched (ie not used for
> the search) fields.
> Let's take an index with 2 fields: 'artist' (for the artist name) an
> 'type' (for his type of music).
> I need to perform a search on the 'artist' field to retrieve a list of
> artists matching the query. But I also need to have the number of
> artists per 'type'.
> My first solution was to write a HitCollector which do the work:
>      public void collect(int doc, float score) {
>      Document document = indexReader_.document(doc);
>      if(document.get("type").equals("Rock"))
>          nbRock++;
>      ...
>      }
> But, as I first get the document to analyze my non-searched field, the
> treatment can be very long for searches with lots of results.
>
> Is there any better(=faster) method to have this extra info ??

It's best not to retrieve document fields during search.
You can do this after collecting, possibly in the order
that you collected the documents.

Have fun,
Ype

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org