You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Doug Cutting <cu...@nutch.org> on 2006/01/25 20:41:14 UTC

Re: Ideas for enhancements

Howie Wang wrote:
> 1. A String[] HitDetails.getValues(String field) method that
> returns an array of the values. The current only returns a
> single string, and Lucene indexes can have multiple values
> per field.

That sounds useful.  Please submit a patch against the trunk attached to 
a bug report.

> 2. In Link.java, put in a field (parentURL) for the URL of the page that
> contains the link. Right now it seems we just have the links themselves
> and we can't backtrack where they come from. Being able to backtrack
> through the links is handy for doing something like categorization. For
> example, you see that all the links are coming from a page about poodles,
> so you might categorize the linked page as a poodle page. It might also
> come in handy for doing something like a Google TrustRank scoring, where
> you penalize certain sites if they're a known link farm, or boost them 
> if they're
> from some place respected like DMOZ.

This would certainly be useful functionality.  The link db has changed 
substantially in the current trunk and there is no longer a class named 
Link.  This has been replaced with Inlink and Outlink.  Have a look at 
the trunk and see if what you need isn't already there.

> 3. Get sorting to work on multiple fields. Lucene already works on
> multiple fields so it shouldn't be difficult to get this working. Just
> change the places where is passes down String field so that it
> accepts an array. The sort fields could be read from the query
> string in order:
> 
>   search.jsp?sort=score&reverse=true&sort=date&reverse=false

This would also be useful.  Please submit a patch against the trunk.

Thanks!

Doug

Re: Ideas for enhancements

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi Howie,
> Howie Wang wrote:
>> 1. A String[] HitDetails.getValues(String field) method that
>> returns an array of the values. The current only returns a
>> single string, and Lucene indexes can have multiple values
>> per field.
>
> That sounds useful.  Please submit a patch against the trunk  
> attached to a bug report.

Any work already done for this? I would love to have multiple values  
and if there is nothing done yet I would love to create such a patch.

Thanks.
Stefan