You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "King Kong (JIRA)" <ji...@apache.org> on 2006/08/21 18:41:14 UTC
[jira] Commented: (NUTCH-355) The title of query result could like
the summary have the highlight??
[ http://issues.apache.org/jira/browse/NUTCH-355?page=comments#action_12429450 ]
King Kong commented on NUTCH-355:
---------------------------------
I add a class name of Titler
package org.apache.nutch.searcher;
//overleap import ...
public class Titler implements Configurable{
private int maxLength = 20;
private Analyzer analyzer = null;
private Configuration conf = null;
public Titler() { }
public Titler(Configuration conf) {
setConf(conf);
}
/* ----------------------------- *
* <implementation:Configurable> *
* ----------------------------- */
public Configuration getConf() {
return conf;
}
public void setConf(Configuration conf) {
this.conf = conf;
this.analyzer = new NutchDocumentAnalyzer(conf);
this.maxLength = conf.getInt("searcher.title.maxlength", 40);
}
public Summary getSummary(String text, Query query) {
Token[] tokens = getTokens(text); // parse text to token array
if (tokens.length == 0)
return new Summary();
String[] terms = query.getTerms();
HashSet highlight = new HashSet(); // put query terms in table
for (int i = 0; i < terms.length; i++)
highlight.add(terms[i]);
Summary s = new Summary();
int offset = 0;
for( int i= 0; i< tokens.length && tokens[i].startOffset()< this.maxLength; i++){
Token token = tokens[i];
//
// If we find a term that's in the query...
//
if (highlight.contains(token.termText())) {
s.add(new Fragment(text.substring(offset,token.startOffset())));
s.add(new Highlight(text.substring(token.startOffset(),token.endOffset())));
offset = token.endOffset();
}
}
s.add(new Fragment(text.substring(offset,Math.min(text.length(), this.maxLength))));
if (text.length() > this.maxLength){
s.add(new Ellipsis());
}
return s;
}
/** Maximun number of tokens inspect in a summary . */
private static final int token_deep = 1000;
private Token[] getTokens(String text) {
ArrayList result = new ArrayList();
TokenStream ts = analyzer.tokenStream("title", new StringReader(text));
Token token = null;
while (result.size()<token_deep) {
try {
token = ts.next();
} catch (IOException e) {
token = null;
}
if (token == null) { break; }
result.add(token);
}
try {
ts.close();
} catch (IOException e) {
// ignore
}
return (Token[]) result.toArray(new Token[result.size()]);
}
}
then, I add a property titler in NutchBean :
public class NutchBean...
{
...
private Titler titler;
...
public NutchBean(Configuration conf, Path dir) throws IOException {
....
this.titler = new Titler(conf);
}
...
//add getTitle() with highlight
public Summary getTitle(HitDetails hit, Query query) throws IOException {
return titler.getSummary(hit.getValue("title"),query);
}
}
finally, in search.jsp,
String title = detail.getValue("title");
change to ,
String title =bean.getTitle(detail,query).toHtml(true);
<a target="_blank" href="<%=url%>"><%=Entities.encode(title)%></a>
change to ,
<a target="_blank" href="<%=url%>"><%=title%></a>
I recomplied , and it does well,
but I don't know if I can do it like this .
Could you give me any suggestion??
> The title of query result could like the summary have the highlight??
> ----------------------------------------------------------------------
>
> Key: NUTCH-355
> URL: http://issues.apache.org/jira/browse/NUTCH-355
> Project: Nutch
> Issue Type: Wish
> Components: searcher
> Affects Versions: 0.8
> Environment: all
> Reporter: King Kong
>
> I'd like to make the title hightlight, but i can't found how to do it .
> when i query "Nutch" , the result must like this:
> <a href="http://lucene.apache.org/nutch/" >Welcome to <b>Nutch</b>! </a>
> This is the first <b>Nutch</b> release as an Apache Lucene sub-project. See CHANGES.txt for details. The release is available here. ... <b>Nutch</b>has now graduated from the Apache incubator, and is now a Subproject of Lucene. ...
>
> ....
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira