You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rasik Pandey <ra...@ajlsm.com> on 2004/03/31 14:39:46 UTC
RE : Performance of hit highlighting and finding term positions for a specific document
Hello,
> I've been meaning to look into good ways to store token offset
> information to allow for very
> efficient highlighting and I believe Mark may also be looking
> into improving the highlighter via
> other means such as temporary ram indexes. Search the archives
> to get a background on some of the
> idea's we've tossed around ('Dmitry's Term Vector stuff, plus
> some' and 'Demoting results' come to
> mind as threads that touch this topic).
I would be nice if CachingRewrittenQueryWrapper.java that I sent to lucene-dev (see below) last week became part of these highlighting effors, if appropriate. We use it to collect terms for a query that searches of multiple indices.
Regards,
RBP
> -----Message d'origine-----
> De : Rasik Pandey [mailto:rasik.pandey@ajlsm.com]
> Envoyé : mercredi 17 mars 2004 13:36
> À : 'Lucene Developers List'; korfut@lycos.com
> Objet : RE : Query Term Collector (was: Re: New highlighter
> package available)
>
> Hello All,
>
> I don't know how this Thread/issue was resolved, but if you are
> still interested I have a simple way of doing this term
> collection ONLY at query time. I've tested it and it works with
> highlighting, etc. without the extra rewrite() call on the
> index.
>
> Comments are welcome!
>
>
> package org.apache.lucene.search;
>
> import org.apache.lucene.search.Weight;
> import org.apache.lucene.search.Searcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Similarity;
> import org.apache.lucene.index.IndexReader;
>
> import java.io.IOException;
>
> /*Rasik Pandey rasik.pandey@ajlsm.com*/
> /**Simple wrapper for a Lucene query that
> * collects all queries generated by calling
> * rewrite on the original Lucene query and stores
> * them in a BooleanQuery.
> *
> * A Searcher will call the rewrite() method
> * for each index and hence generate a query
> * containing terms for the respective index. This
> * class collects these queries so that they may be
> * used for highlighting, query expansion, etc. by
> * retrieving the underlying terms.
> *
> * @see #rewrite
> * @see #getRewrittenQueries
> * @see #resetRewrittenQueries
> * @see #getOriginalQuery
> */
> public class CachingRewrittenQueryWrapper extends Query{
> protected org.apache.lucene.search.Query originalQuery =
> null;
> protected BooleanQuery rewrittenQueries = new
> BooleanQuery();
>
> public CachingRewrittenQueryWrapper(Query originalQuery) {
> this.originalQuery = originalQuery;
> }
>
> public BooleanQuery getRewrittenQueries() {
> return this.rewrittenQueries;
> }
>
> public void resetRewrittenQueries() {
> BooleanQuery newCachedQuery = new BooleanQuery();
>
> newCachedQuery.setMaxClauseCount(this.rewrittenQueries.getMaxCl
> auseCount());
> this.rewrittenQueries = newCachedQuery;
> }
>
> public Query getOriginalQuery() {
> return this.originalQuery;
> }
>
> public void setBoost(float b) {
> this.originalQuery.setBoost(b);
> }
>
> public float getBoost() {
> return this.originalQuery.getBoost();
> }
>
>
> protected Weight createWeight(Searcher searcher) {
> return this.originalQuery.createWeight(searcher);
> }
>
> public Query rewrite(IndexReader reader) throws IOException
> {
> Query rewrittenQuery =
> this.originalQuery.rewrite(reader);
> this.rewrittenQueries.add(rewrittenQuery, false,
> false);
> return rewrittenQuery;
> }
>
> public Query combine(Query[] queries) {
> return this.originalQuery.combine(queries);
> }
>
> public Similarity getSimilarity(Searcher searcher) {
> return this.originalQuery.getSimilarity(searcher);
> }
>
> protected void finalize() throws Throwable {
> super.finalize();
> //TODO maybe something here to ensure that all
> resources held by rewrittenQueries are cleaned up properly
> }
>
> public String toString() {
> return this.originalQuery.toString();
> }
>
> public String toString(String field) {
> return this.originalQuery.toString(field);
> }
> }
>
>
>
> ---------------------------------------------------------------
> ------
> To unsubscribe, e-mail: lucene-dev-
> unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-
> help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: RE : Performance of hit highlighting and finding term positions
for a specific document
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Rasik Pandey wrote:
>Hello,
>
>
>
>>I've been meaning to look into good ways to store token offset
>>information to allow for very
>>efficient highlighting and I believe Mark may also be looking
>>into improving the highlighter via
>>other means such as temporary ram indexes. Search the archives
>>to get a background on some of the
>>idea's we've tossed around ('Dmitry's Term Vector stuff, plus
>>some' and 'Demoting results' come to
>>mind as threads that touch this topic).
>>
>>
>
>I would be nice if CachingRewrittenQueryWrapper.java that I sent to lucene-dev (see below) last week became part of these highlighting effors, if appropriate. We use it to collect terms for a query that searches of multiple indices.
>
>
Actually I had to write one for my tests with the highlighter. I'm using
a MultiSearcher and a WildcardQuery which the highlighter didn't have
support for.
My impl was fairly basic so I wouldn't suggest a contribution... I'm
sure your's is better. The suggested changes to the highlighter for
providing tokens would make this work well together.
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc
NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster