You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Rasik Pandey <ra...@ajlsm.com> on 2004/03/31 14:39:46 UTC

RE : Performance of hit highlighting and finding term positions for a specific document

Hello,

> I've been meaning to look into good ways to store token offset
> information to allow for very
> efficient highlighting and I believe Mark may also be looking
> into improving the highlighter via
> other means such as temporary ram indexes. Search the archives
> to get a background on some of the
> idea's we've tossed around ('Dmitry's Term Vector stuff, plus
> some' and 'Demoting results' come to
> mind as threads that touch this topic).

I would be nice if CachingRewrittenQueryWrapper.java that I sent to lucene-dev (see below) last week became part of these highlighting effors, if appropriate. We use it to collect terms for a query that searches of multiple indices.

Regards,
RBP





> -----Message d'origine-----
> De : Rasik Pandey [mailto:rasik.pandey@ajlsm.com]
> Envoyé : mercredi 17 mars 2004 13:36
> À : 'Lucene Developers List'; korfut@lycos.com
> Objet : RE : Query Term Collector (was: Re: New highlighter
> package available)
> 
> Hello All,
> 
> I don't know how this Thread/issue was resolved, but if you are
> still interested I have a simple way of doing this term
> collection ONLY at query time. I've tested it and it works with
> highlighting, etc. without the extra rewrite() call on the
> index.
> 
> Comments are welcome!
> 
> 
> package org.apache.lucene.search;
> 
> import org.apache.lucene.search.Weight;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Similarity;
> import org.apache.lucene.index.IndexReader;
> 
> import java.io.IOException;
> 
> /*Rasik Pandey rasik.pandey@ajlsm.com*/
> /**Simple wrapper for a Lucene query that
>  * collects all queries generated by calling
>  * rewrite on the original Lucene query and stores
>  * them in a BooleanQuery.
>  *
>  * A Searcher will call the rewrite() method
>  * for each index and hence generate a query
>  * containing terms for the respective index. This
>  * class collects these queries so that they may be
>  * used for highlighting, query expansion, etc. by
>  * retrieving the underlying terms.
>  *
>  * @see #rewrite
>  * @see #getRewrittenQueries
>  * @see #resetRewrittenQueries
>  * @see #getOriginalQuery
>  */
> public class CachingRewrittenQueryWrapper extends Query{
>     protected org.apache.lucene.search.Query originalQuery =
> null;
>     protected BooleanQuery rewrittenQueries = new
> BooleanQuery();
> 
>     public CachingRewrittenQueryWrapper(Query originalQuery) {
>         this.originalQuery = originalQuery;
>     }
> 
>     public BooleanQuery getRewrittenQueries() {
>         return this.rewrittenQueries;
>     }
> 
>     public void resetRewrittenQueries() {
>         BooleanQuery newCachedQuery = new BooleanQuery();
> 
> newCachedQuery.setMaxClauseCount(this.rewrittenQueries.getMaxCl
> auseCount());
>         this.rewrittenQueries = newCachedQuery;
>     }
> 
>     public Query getOriginalQuery() {
>         return this.originalQuery;
>     }
> 
>     public void setBoost(float b) {
>         this.originalQuery.setBoost(b);
>     }
> 
>     public float getBoost() {
>         return this.originalQuery.getBoost();
>     }
> 
> 
>     protected Weight createWeight(Searcher searcher) {
>         return this.originalQuery.createWeight(searcher);
>     }
> 
>     public Query rewrite(IndexReader reader) throws IOException
> {
>         Query rewrittenQuery =
> this.originalQuery.rewrite(reader);
>         this.rewrittenQueries.add(rewrittenQuery, false,
> false);
>         return rewrittenQuery;
>     }
> 
>     public Query combine(Query[] queries) {
>         return this.originalQuery.combine(queries);
>     }
> 
>     public Similarity getSimilarity(Searcher searcher) {
>         return this.originalQuery.getSimilarity(searcher);
>     }
> 
>     protected void finalize() throws Throwable {
>         super.finalize();
>         //TODO maybe something here to ensure that all
> resources held by rewrittenQueries are cleaned up properly
>     }
> 
>     public String toString() {
>         return this.originalQuery.toString();
>     }
> 
>     public String toString(String field) {
>        return this.originalQuery.toString(field);
>     }
> }
> 
> 
> 
> ---------------------------------------------------------------
> ------
> To unsubscribe, e-mail: lucene-dev-
> unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-
> help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: RE : Performance of hit highlighting and finding term positions for a specific document

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Rasik Pandey wrote:

>Hello,
>
>  
>
>>I've been meaning to look into good ways to store token offset
>>information to allow for very
>>efficient highlighting and I believe Mark may also be looking
>>into improving the highlighter via
>>other means such as temporary ram indexes. Search the archives
>>to get a background on some of the
>>idea's we've tossed around ('Dmitry's Term Vector stuff, plus
>>some' and 'Demoting results' come to
>>mind as threads that touch this topic).
>>    
>>
>
>I would be nice if CachingRewrittenQueryWrapper.java that I sent to lucene-dev (see below) last week became part of these highlighting effors, if appropriate. We use it to collect terms for a query that searches of multiple indices.
>  
>
Actually I had to write one for my tests with the highlighter. I'm using 
a MultiSearcher and a WildcardQuery which the highlighter didn't have 
support for. 

My impl was fairly basic so I wouldn't suggest a contribution... I'm 
sure your's is better.  The suggested changes to the highlighter for 
providing tokens would make this work well together.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster