You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "King Kong (JIRA)" <ji...@apache.org> on 2006/08/21 18:41:14 UTC

[jira] Commented: (NUTCH-355) The title of query result could like the summary have the highlight??

    [ http://issues.apache.org/jira/browse/NUTCH-355?page=comments#action_12429450 ] 
            
King Kong commented on NUTCH-355:
---------------------------------



I add a class name of Titler

package org.apache.nutch.searcher;

//overleap  import ...

public class Titler implements Configurable{

  private int maxLength = 20;
  private Analyzer analyzer = null;
  private Configuration conf = null;

  
  public Titler() { }
  
  public Titler(Configuration conf) {
    setConf(conf);
  }
  
  
  /* ----------------------------- *
   * <implementation:Configurable> *
   * ----------------------------- */
  
  public Configuration getConf() {
    return conf;
  }
  
  public void setConf(Configuration conf) {
    this.conf = conf;
    this.analyzer = new NutchDocumentAnalyzer(conf);
    this.maxLength = conf.getInt("searcher.title.maxlength", 40);
  }
 
  public Summary getSummary(String text, Query query) {
  Token[] tokens = getTokens(text);             // parse text to token array
    
    if (tokens.length == 0)
      return new Summary();
    
    String[] terms = query.getTerms();
    HashSet highlight = new HashSet();            // put query terms in table
    for (int i = 0; i < terms.length; i++)
      highlight.add(terms[i]);
    
    Summary s = new Summary();
    
    int offset = 0;
    for( int i= 0; i< tokens.length && tokens[i].startOffset()< this.maxLength; i++){
     Token token = tokens[i];	
      //
      // If we find a term that's in the query...
      //
      if (highlight.contains(token.termText())) {
          s.add(new Fragment(text.substring(offset,token.startOffset())));
    	  s.add(new Highlight(text.substring(token.startOffset(),token.endOffset())));
    	  offset = token.endOffset();
      }
      
    }
 
     s.add(new Fragment(text.substring(offset,Math.min(text.length(), this.maxLength))));
 
     if (text.length() > this.maxLength){
       s.add(new Ellipsis());	  
     }
    
    return s;
  }
 
  
  /** Maximun number of tokens inspect in a summary . */
  private static final int token_deep = 1000;
  
  private Token[] getTokens(String text) {
    ArrayList result = new ArrayList();
    TokenStream ts = analyzer.tokenStream("title", new StringReader(text));
    Token token = null;
    while (result.size()<token_deep) {
      try {
        token = ts.next();
      } catch (IOException e) {
        token = null;
      }
      if (token == null) { break; }
      result.add(token);
    }
    try {
      ts.close();
    } catch (IOException e) {
      // ignore
    }
    return (Token[]) result.toArray(new Token[result.size()]);
   }
  }


then, I add a property titler in NutchBean :

public class NutchBean...
{
   ...
     private Titler titler;
   ...
      public NutchBean(Configuration conf, Path dir) throws IOException {
       ....
       this.titler = new Titler(conf);
     }

    ...
       //add getTitle() with highlight
       public Summary getTitle(HitDetails hit, Query query) throws IOException {
	 return titler.getSummary(hit.getValue("title"),query);
       }
   }

finally, in search.jsp, 

String title = detail.getValue("title");
change to ,
String title =bean.getTitle(detail,query).toHtml(true);  

<a target="_blank" href="<%=url%>"><%=Entities.encode(title)%></a>  
change  to ,
 <a target="_blank" href="<%=url%>"><%=title%></a>


I recomplied , and it does well, 

but I don't know if I  can  do it like this .
Could you give me any suggestion??



  
    

> The title of query result  could like the summary have the highlight??
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-355
>                 URL: http://issues.apache.org/jira/browse/NUTCH-355
>             Project: Nutch
>          Issue Type: Wish
>          Components: searcher
>    Affects Versions: 0.8
>         Environment: all
>            Reporter: King Kong
>
> I'd like to make the title hightlight, but i can't found how to do it .
> when i query "Nutch" , the result must like this:
> <a href="http://lucene.apache.org/nutch/" >Welcome to <b>Nutch</b>!  </a>  
> This is the first <b>Nutch</b> release as an Apache Lucene sub-project. See CHANGES.txt for details. The release is available here. ... <b>Nutch</b>has now graduated from the Apache incubator, and is now a Subproject of Lucene. ...
>  
> ....

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira