You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by James O'Rourke <ja...@bittorrent.com> on 2006/10/13 20:39:31 UTC

highlighting with WildcardQuery

Is there anyway to do highlighting when using a WildcardQuery when  
there is no IndexReader available? I simply want to do it with a  
chunk of text, but it fails because the WildcardQuery needs to call  
rewrite - but doesn't know about the IndexReader.

Code (using PyLucene-2.0.0 - can translate to java if like)

def gethighlightedfragments(text, searchString,
     fragmentLength = 50, numFragments = 3,  opening= '<span class= 
\"highlight\">', closing = '</span>'):
     """ Returns a list of text fragments with returns included for  
80 char max width """
     """ Defaults to OR operator which is good for formatting """
     analyzer = StandardAnalyzer()
     #print text
     strs = searchString.split()
     bq = BooleanQuery()
     for s in strs:
         print s
         q = WildcardQuery(Term('f', '*' + s +  '*'))
         #print q.toString()
         bq.add(q,  BooleanClause.Occur.SHOULD)
     #print bq.toString()
     scorer = QueryScorer(bq)
     formatter = SimpleHTMLFormatter(opening, closing)
     highlighter = Highlighter(formatter, scorer)
     fragmenter = SimpleFragmenter(fragmentLength)
     highlighter.setTextFragmenter(fragmenter)

     tokenStream = analyzer.tokenStream('f', StringReader(text))
     return  highlighter.getBestFragments(tokenStream, text,  
numFragments)


Basically, I want to show partial word matches also.

James


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: highlighting with WildcardQuery

Posted by Doron Cohen <DO...@il.ibm.com>.
The IndexReader is needed for finding all wildcard matches (by the index
lexicon). It seems you do not want to expand the wild card query by the
index lexicon, but rather with that of the highlighted text (which may not
be indexed at all). I think you have at least two ways to do that:

(1) create a (highlight) QueryScorer with:
   new QueryScorer(WeightedTerm weightedTerms[])
which means that you provide all the "lexicon" knowledge usually taken from
the index(reader), i.e. which words are valid for the wild card
'expression'.

(2) extend QueryScorer, implementing
   float getTokenScore(Token token)
such that tokens matching the wildcard expr get nonzero score.

- Doron

"James O'Rourke" <ja...@bittorrent.com> wrote on 13/10/2006 11:39:31:

> Is there anyway to do highlighting when using a WildcardQuery when
> there is no IndexReader available? I simply want to do it with a
> chunk of text, but it fails because the WildcardQuery needs to call
> rewrite - but doesn't know about the IndexReader.
>
> Code (using PyLucene-2.0.0 - can translate to java if like)
>
> def gethighlightedfragments(text, searchString,
>      fragmentLength = 50, numFragments = 3,  opening= '<span class=
> \"highlight\">', closing = '</span>'):
>      """ Returns a list of text fragments with returns included for
> 80 char max width """
>      """ Defaults to OR operator which is good for formatting """
>      analyzer = StandardAnalyzer()
>      #print text
>      strs = searchString.split()
>      bq = BooleanQuery()
>      for s in strs:
>          print s
>          q = WildcardQuery(Term('f', '*' + s +  '*'))
>          #print q.toString()
>          bq.add(q,  BooleanClause.Occur.SHOULD)
>      #print bq.toString()
>      scorer = QueryScorer(bq)
>      formatter = SimpleHTMLFormatter(opening, closing)
>      highlighter = Highlighter(formatter, scorer)
>      fragmenter = SimpleFragmenter(fragmentLength)
>      highlighter.setTextFragmenter(fragmenter)
>
>      tokenStream = analyzer.tokenStream('f', StringReader(text))
>      return  highlighter.getBestFragments(tokenStream, text,
> numFragments)
>
>
> Basically, I want to show partial word matches also.
>
> James
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org