You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by DM Smith <dm...@gmail.com> on 2009/10/04 00:04:20 UTC

Searcher javadoc problem

I'm working on migrating my code to 2.9. And I'm trying to figure out  
what to do. Along the way I found a circular argument in the JavaDoc  
for Searcher. BTW, this is not a user question.

My current code calls:
                 Hits hits = searcher.search(query);

The JavaDoc for it says:
   /** Returns the documents matching <code>query</code>.
    * @throws BooleanQuery.TooManyClauses
    * @deprecated Hits will be removed in Lucene 3.0. Use
    * {@link #search(Query, Filter, int)} instead.
    */
   public final Hits search(Query query) throws IOException {
     return search(query, (Filter)null);
   }

However, search(Query, Filter, int) is not quite appropriate as I need  
all hits. I guess I could pass null for filter and MAX_INT.

So, I found search(Query, Collector), which seems most appropriate.  
(Not sure though, but I'll figure it out.) However, the JavaDoc for it  
says:
   /** Lower-level search API.
   *
   * <p>{@link Collector#collect(int)} is called for every matching  
document.
   *
   * <p>Applications should only use this if they need <i>all</i> of the
   * matching documents.  The high-level search API ({@link
   * Searcher#search(Query)}) is usually more efficient, as it skips
   * non-high-scoring hits.
   * <p>Note: The <code>score</code> passed to this method is a raw  
score.
   * In other words, the score will not necessarily be a float whose  
value is
   * between 0 and 1.
   * @throws BooleanQuery.TooManyClauses
   */
  public void search(Query query, Collector results)
    throws IOException {
    search(createWeight(query), null, results);
  }

But Searcher.search(Query) is deprecated.

So what is the appropriate documentation for getting all "hits"? Seems  
to say, "Don't do that"

-- DM

Re: Searcher javadoc problem

Posted by DM Smith <dm...@gmail.com>.

On Oct 3, 2009, at 9:23 PM, Mark Miller <ma...@gmail.com> wrote:

> Gotchya - that clears up my mind. I know your an advanced user, so it
> threw me for a loop that you would be using Hits like a Collector.  
> Just
> have been seeing that a lot lately.

Is there enough interest to add a new search method?  (Hiterator??  
Maybe a parameter on a Collector??) It would return a stream of hits,  
one each on the call to next. I guess it should take a Filter. No  
assumption on order in the abstract. An implemention can define an  
order. In my case it would be doc order when not parallel.

Btw the use case parallels a lookup on an RDBMS table: Find all  
matching records and let the app handle the ordering and slicing.

--DM


>
> Just read to much into: So what is the appropriate documentation for
> getting all "hits"?
>
> Another option (of course) is to maintain your own Hits class. Sounds
> like working up something with a Collector on your own would be better
> though - why compute the score if you don't need it. Hits caching was
> rarely that useful either.
>
> DM Smith wrote:
>> It makes sense if you understand the context. We make each verse of a
>> Bible a document. There are about 36000 docc in a Bible. We want a
>> user to find all the verses that match there search to give the count
>> of total hits. We then show slices of the hits from first hit to last
>> im document order typically about 100 at a time. Scoring is  
>> unimportant.
>>
>> The user can also choose to prioritize and limit the results. This
>> uses scoring and the top docs. This is not the users prefered search.
>>
>> So I don't mind being nasty. But having looked at it I think it would
>> be better to have a non-scoring collector that is a co-process that
>> w/an iterator interface gets the next doc on demand, from first doc  
>> in
>> index to last.
>>
>> -- DM 
>>
>>
>> On Oct 3, 2009, at 6:12 PM, Mark Miller <ma...@gmail.com>  
>> wrote:
>>
>>> You used Hits to get all that hits? Nasty man - thats we  
>>> deprecated that
>>> class - even though the JavaDoc warns you thats a major speed trap,
>>> everyone still did it ... use a Collector.
>>>
>>> Your right though - it shouldn't point to IndexSearcher.search 
>>> (Query)
>>> after that - it should point to IndexSearcher.search(Query, int)
>>>
>>> Goto fix that.
>>>
>>> DM Smith wrote:
>>>> I'm working on migrating my code to 2.9. And I'm trying to figure  
>>>> out
>>>> what to do. Along the way I found a circular argument in the  
>>>> JavaDoc
>>>> for Searcher. BTW, this is not a user question.
>>>>
>>>> My current code calls:
>>>>               Hits hits = searcher.search(query);
>>>>
>>>> The JavaDoc for it says:
>>>> /** Returns the documents matching <code>query</code>.
>>>>  * @throws BooleanQuery.TooManyClauses
>>>>  * @deprecated Hits will be removed in Lucene 3.0. Use
>>>>  * {@link #search(Query, Filter, int)} instead.
>>>>  */
>>>> public final Hits search(Query query) throws IOException {
>>>>   return search(query, (Filter)null);
>>>> }
>>>>
>>>> However, search(Query, Filter, int) is not quite appropriate as I  
>>>> need
>>>> all hits. I guess I could pass null for filter and MAX_INT.
>>>>
>>>> So, I found search(Query, Collector), which seems most appropriate.
>>>> (Not sure though, but I'll figure it out.) However, the JavaDoc  
>>>> for it
>>>> says:
>>>> /** Lower-level search API.
>>>> *
>>>> * <p>{@link Collector#collect(int)} is called for every matching
>>>> document.
>>>> *
>>>> * <p>Applications should only use this if they need <i>all</i> of  
>>>> the
>>>> * matching documents.  The high-level search API ({@link
>>>> * Searcher#search(Query)}) is usually more efficient, as it skips
>>>> * non-high-scoring hits.
>>>> * <p>Note: The <code>score</code> passed to this method is a raw
>>>> score.
>>>> * In other words, the score will not necessarily be a float whose
>>>> value is
>>>> * between 0 and 1.
>>>> * @throws BooleanQuery.TooManyClauses
>>>> */
>>>> public void search(Query query, Collector results)
>>>>  throws IOException {
>>>>  search(createWeight(query), null, results);
>>>> }
>>>>
>>>> But Searcher.search(Query) is deprecated.
>>>>
>>>> So what is the appropriate documentation for getting all "hits"?  
>>>> Seems
>>>> to say, "Don't do that"
>>>>
>>>> -- DM
>>>>
>>>>
>>>
>>>
>>> -- 
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> --- 
>>> ------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Searcher javadoc problem

Posted by Mark Miller <ma...@gmail.com>.

Gotchya - that clears up my mind. I know your an advanced user, so it
threw me for a loop that you would be using Hits like a Collector. Just
have been seeing that a lot lately.

Just read to much into: So what is the appropriate documentation for
getting all "hits"?

Another option (of course) is to maintain your own Hits class. Sounds
like working up something with a Collector on your own would be better
though - why compute the score if you don't need it. Hits caching was
rarely that useful either.

DM Smith wrote:
> It makes sense if you understand the context. We make each verse of a
> Bible a document. There are about 36000 docc in a Bible. We want a
> user to find all the verses that match there search to give the count
> of total hits. We then show slices of the hits from first hit to last
> im document order typically about 100 at a time. Scoring is unimportant.
>
> The user can also choose to prioritize and limit the results. This
> uses scoring and the top docs. This is not the users prefered search.
>
> So I don't mind being nasty. But having looked at it I think it would
> be better to have a non-scoring collector that is a co-process that
> w/an iterator interface gets the next doc on demand, from first doc in
> index to last.
>
> -- DM 
>
>
> On Oct 3, 2009, at 6:12 PM, Mark Miller <ma...@gmail.com> wrote:
>
>> You used Hits to get all that hits? Nasty man - thats we deprecated that
>> class - even though the JavaDoc warns you thats a major speed trap,
>> everyone still did it ... use a Collector.
>>
>> Your right though - it shouldn't point to IndexSearcher.search(Query)
>> after that - it should point to IndexSearcher.search(Query, int)
>>
>> Goto fix that.
>>
>> DM Smith wrote:
>>> I'm working on migrating my code to 2.9. And I'm trying to figure out
>>> what to do. Along the way I found a circular argument in the JavaDoc
>>> for Searcher. BTW, this is not a user question.
>>>
>>> My current code calls:
>>>                Hits hits = searcher.search(query);
>>>
>>> The JavaDoc for it says:
>>>  /** Returns the documents matching <code>query</code>.
>>>   * @throws BooleanQuery.TooManyClauses
>>>   * @deprecated Hits will be removed in Lucene 3.0. Use
>>>   * {@link #search(Query, Filter, int)} instead.
>>>   */
>>>  public final Hits search(Query query) throws IOException {
>>>    return search(query, (Filter)null);
>>>  }
>>>
>>> However, search(Query, Filter, int) is not quite appropriate as I need
>>> all hits. I guess I could pass null for filter and MAX_INT.
>>>
>>> So, I found search(Query, Collector), which seems most appropriate.
>>> (Not sure though, but I'll figure it out.) However, the JavaDoc for it
>>> says:
>>>  /** Lower-level search API.
>>>  *
>>>  * <p>{@link Collector#collect(int)} is called for every matching
>>> document.
>>>  *
>>>  * <p>Applications should only use this if they need <i>all</i> of the
>>>  * matching documents.  The high-level search API ({@link
>>>  * Searcher#search(Query)}) is usually more efficient, as it skips
>>>  * non-high-scoring hits.
>>>  * <p>Note: The <code>score</code> passed to this method is a raw
>>> score.
>>>  * In other words, the score will not necessarily be a float whose
>>> value is
>>>  * between 0 and 1.
>>>  * @throws BooleanQuery.TooManyClauses
>>>  */
>>> public void search(Query query, Collector results)
>>>   throws IOException {
>>>   search(createWeight(query), null, results);
>>> }
>>>
>>> But Searcher.search(Query) is deprecated.
>>>
>>> So what is the appropriate documentation for getting all "hits"? Seems
>>> to say, "Don't do that"
>>>
>>> -- DM
>>>
>>>
>>
>>
>> -- 
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Searcher javadoc problem

Posted by DM Smith <dm...@gmail.com>.

It makes sense if you understand the context. We make each verse of a  
Bible a document. There are about 36000 docc in a Bible. We want a  
user to find all the verses that match there search to give the count  
of total hits. We then show slices of the hits from first hit to last  
im document order typically about 100 at a time. Scoring is unimportant.

The user can also choose to prioritize and limit the results. This  
uses scoring and the top docs. This is not the users prefered search.

So I don't mind being nasty. But having looked at it I think it would  
be better to have a non-scoring collector that is a co-process that w/ 
an iterator interface gets the next doc on demand, from first doc in  
index to last.

-- DM   



On Oct 3, 2009, at 6:12 PM, Mark Miller <ma...@gmail.com> wrote:

> You used Hits to get all that hits? Nasty man - thats we deprecated  
> that
> class - even though the JavaDoc warns you thats a major speed trap,
> everyone still did it ... use a Collector.
>
> Your right though - it shouldn't point to IndexSearcher.search(Query)
> after that - it should point to IndexSearcher.search(Query, int)
>
> Goto fix that.
>
> DM Smith wrote:
>> I'm working on migrating my code to 2.9. And I'm trying to figure out
>> what to do. Along the way I found a circular argument in the JavaDoc
>> for Searcher. BTW, this is not a user question.
>>
>> My current code calls:
>>                Hits hits = searcher.search(query);
>>
>> The JavaDoc for it says:
>>  /** Returns the documents matching <code>query</code>.
>>   * @throws BooleanQuery.TooManyClauses
>>   * @deprecated Hits will be removed in Lucene 3.0. Use
>>   * {@link #search(Query, Filter, int)} instead.
>>   */
>>  public final Hits search(Query query) throws IOException {
>>    return search(query, (Filter)null);
>>  }
>>
>> However, search(Query, Filter, int) is not quite appropriate as I  
>> need
>> all hits. I guess I could pass null for filter and MAX_INT.
>>
>> So, I found search(Query, Collector), which seems most appropriate.
>> (Not sure though, but I'll figure it out.) However, the JavaDoc for  
>> it
>> says:
>>  /** Lower-level search API.
>>  *
>>  * <p>{@link Collector#collect(int)} is called for every matching
>> document.
>>  *
>>  * <p>Applications should only use this if they need <i>all</i> of  
>> the
>>  * matching documents.  The high-level search API ({@link
>>  * Searcher#search(Query)}) is usually more efficient, as it skips
>>  * non-high-scoring hits.
>>  * <p>Note: The <code>score</code> passed to this method is a raw  
>> score.
>>  * In other words, the score will not necessarily be a float whose
>> value is
>>  * between 0 and 1.
>>  * @throws BooleanQuery.TooManyClauses
>>  */
>> public void search(Query query, Collector results)
>>   throws IOException {
>>   search(createWeight(query), null, results);
>> }
>>
>> But Searcher.search(Query) is deprecated.
>>
>> So what is the appropriate documentation for getting all "hits"?  
>> Seems
>> to say, "Don't do that"
>>
>> -- DM
>>
>>
>
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Searcher javadoc problem

Posted by Mark Miller <ma...@gmail.com>.

I think it could be reworded as well - its kind of uhh ... but I'll
leave that to someone else if they care. For now I just pointed it to
the correct method.

Mark Miller wrote:
> You used Hits to get all that hits? Nasty man - thats we deprecated that
> class - even though the JavaDoc warns you thats a major speed trap,
> everyone still did it ... use a Collector.
>
> Your right though - it shouldn't point to IndexSearcher.search(Query)
> after that - it should point to IndexSearcher.search(Query, int)
>
> Goto fix that.
>
> DM Smith wrote:
>   
>> I'm working on migrating my code to 2.9. And I'm trying to figure out
>> what to do. Along the way I found a circular argument in the JavaDoc
>> for Searcher. BTW, this is not a user question.
>>
>> My current code calls:
>>                 Hits hits = searcher.search(query);
>>
>> The JavaDoc for it says:
>>   /** Returns the documents matching <code>query</code>. 
>>    * @throws BooleanQuery.TooManyClauses
>>    * @deprecated Hits will be removed in Lucene 3.0. Use
>>    * {@link #search(Query, Filter, int)} instead.
>>    */
>>   public final Hits search(Query query) throws IOException {
>>     return search(query, (Filter)null);
>>   }
>>
>> However, search(Query, Filter, int) is not quite appropriate as I need
>> all hits. I guess I could pass null for filter and MAX_INT.
>>
>> So, I found search(Query, Collector), which seems most appropriate.
>> (Not sure though, but I'll figure it out.) However, the JavaDoc for it
>> says:
>>   /** Lower-level search API.
>>   *
>>   * <p>{@link Collector#collect(int)} is called for every matching
>> document.
>>   *
>>   * <p>Applications should only use this if they need <i>all</i> of the
>>   * matching documents.  The high-level search API ({@link
>>   * Searcher#search(Query)}) is usually more efficient, as it skips
>>   * non-high-scoring hits.
>>   * <p>Note: The <code>score</code> passed to this method is a raw score.
>>   * In other words, the score will not necessarily be a float whose
>> value is
>>   * between 0 and 1.
>>   * @throws BooleanQuery.TooManyClauses
>>   */
>>  public void search(Query query, Collector results)
>>    throws IOException {
>>    search(createWeight(query), null, results);
>>  }
>>
>> But Searcher.search(Query) is deprecated.
>>
>> So what is the appropriate documentation for getting all "hits"? Seems
>> to say, "Don't do that"
>>
>> -- DM
>>
>>
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Searcher javadoc problem

Posted by Mark Miller <ma...@gmail.com>.

You used Hits to get all that hits? Nasty man - thats we deprecated that
class - even though the JavaDoc warns you thats a major speed trap,
everyone still did it ... use a Collector.

Your right though - it shouldn't point to IndexSearcher.search(Query)
after that - it should point to IndexSearcher.search(Query, int)

Goto fix that.

DM Smith wrote:
> I'm working on migrating my code to 2.9. And I'm trying to figure out
> what to do. Along the way I found a circular argument in the JavaDoc
> for Searcher. BTW, this is not a user question.
>
> My current code calls:
>                 Hits hits = searcher.search(query);
>
> The JavaDoc for it says:
>   /** Returns the documents matching <code>query</code>. 
>    * @throws BooleanQuery.TooManyClauses
>    * @deprecated Hits will be removed in Lucene 3.0. Use
>    * {@link #search(Query, Filter, int)} instead.
>    */
>   public final Hits search(Query query) throws IOException {
>     return search(query, (Filter)null);
>   }
>
> However, search(Query, Filter, int) is not quite appropriate as I need
> all hits. I guess I could pass null for filter and MAX_INT.
>
> So, I found search(Query, Collector), which seems most appropriate.
> (Not sure though, but I'll figure it out.) However, the JavaDoc for it
> says:
>   /** Lower-level search API.
>   *
>   * <p>{@link Collector#collect(int)} is called for every matching
> document.
>   *
>   * <p>Applications should only use this if they need <i>all</i> of the
>   * matching documents.  The high-level search API ({@link
>   * Searcher#search(Query)}) is usually more efficient, as it skips
>   * non-high-scoring hits.
>   * <p>Note: The <code>score</code> passed to this method is a raw score.
>   * In other words, the score will not necessarily be a float whose
> value is
>   * between 0 and 1.
>   * @throws BooleanQuery.TooManyClauses
>   */
>  public void search(Query query, Collector results)
>    throws IOException {
>    search(createWeight(query), null, results);
>  }
>
> But Searcher.search(Query) is deprecated.
>
> So what is the appropriate documentation for getting all "hits"? Seems
> to say, "Don't do that"
>
> -- DM
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Searcher javadoc problem

Posted by Uwe Schindler <uw...@thetaphi.de>.

Writing a Collector is the correct and fastest way to do this. The Javadoc
pointing to deprec API is incorrect.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: DM Smith [mailto:dmsmith555@gmail.com] 
Sent: Sunday, October 04, 2009 12:04 AM
To: java-dev@lucene.apache.org
Subject: Searcher javadoc problem

 

I'm working on migrating my code to 2.9. And I'm trying to figure out what
to do. Along the way I found a circular argument in the JavaDoc for
Searcher. BTW, this is not a user question.

 

My current code calls:

                Hits hits = searcher.search(query);

 

The JavaDoc for it says:

  /** Returns the documents matching <code>query</code>. 

   * @throws BooleanQuery.TooManyClauses

   * @deprecated Hits will be removed in Lucene 3.0. Use

   * {@link #search(Query, Filter, int)} instead.

   */

  public final Hits search(Query query) throws IOException {

    return search(query, (Filter)null);

  }

 

However, search(Query, Filter, int) is not quite appropriate as I need all
hits. I guess I could pass null for filter and MAX_INT.

 

So, I found search(Query, Collector), which seems most appropriate. (Not
sure though, but I'll figure it out.) However, the JavaDoc for it says:

  /** Lower-level search API.

  *

  * <p>{@link Collector#collect(int)} is called for every matching document.

  *

  * <p>Applications should only use this if they need <i>all</i> of the

  * matching documents.  The high-level search API ({@link

  * Searcher#search(Query)}) is usually more efficient, as it skips

  * non-high-scoring hits.

  * <p>Note: The <code>score</code> passed to this method is a raw score.

  * In other words, the score will not necessarily be a float whose value is

  * between 0 and 1.

  * @throws BooleanQuery.TooManyClauses

  */

 public void search(Query query, Collector results)

   throws IOException {

   search(createWeight(query), null, results);

 }

 

But Searcher.search(Query) is deprecated.

 

So what is the appropriate documentation for getting all "hits"? Seems to
say, "Don't do that"

 

-- DM