You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jack Tang <hi...@gmail.com> on 2005/09/29 09:09:10 UTC
Nutch Suggestion? (Google like "did you mean")
Hi
I am very like Google's "Did you mean" and I notice that nutch now
does not provider this function.
In this article http://today.java.net/lpt/a/211, author Tim White
implemented suggestion using n-gram to generate suggestion index. Do
you think is it good for nutch? I mean index in nutch will be really
huge. Or just provide some dictionaries like jazzy(LGPL) does?
Thanks
/Jack
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
refetching interval
Posted by Michael Ji <fj...@yahoo.com>.
Hi,
I am using nutch 07 and found the following code in
FetchListTool.java
private static final long FETCH_GENERATION_DELAY_MS =
7 * 24 * 60 * 60 * 1000;
that means next refetching time is always 7 days
later, no matter what fetch interval setting in
nutch-site.xml,
I feel puzzled. Could any one give me a hint?
thanks,
Michael,
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: Nutch Suggestion? (Google like "did you mean")
Posted by Jack Tang <hi...@gmail.com>.
Hi Fredrik
Thanks for your posting. I appreciate the way you did, really every nice;)
Regards
/Jack
On 9/29/05, Fredrik Andersson <fi...@gmail.com> wrote:
> Hi Jack!
>
> I like these things to be driven by statistics rather than content of the
> index. If you run a search engine, and want any kind of feedback, you will
> at least save all queries entered. You can store these in an index or
> database, and run a Levenshtein metric on the, potentially misspelled,
> query. If my memory serves me right, a Lucene FuzzyQuery uses this metric,
> so a good approach would be to keep a Lucene index with |query,frequency|
> tuples (updated nightly, weekly, or whatever), and simply search this index
> with a FuzzyQuery with some defined similarity, and pick the most frequent
> query for suggestion.
>
> Fredrik
>
> On 9/29/05, Jack Tang <hi...@gmail.com> wrote:
> > Hi
> >
> > I am very like Google's "Did you mean" and I notice that nutch now
> > does not provider this function.
> >
> > In this article http://today.java.net/lpt/a/211 , author Tim White
> > implemented suggestion using n-gram to generate suggestion index. Do
> > you think is it good for nutch? I mean index in nutch will be really
> > huge. Or just provide some dictionaries like jazzy(LGPL) does?
> >
> > Thanks
> > /Jack
> > --
> > Keep Discovering ... ...
> > http://www.jroller.com/page/jmars
> >
>
>
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
Re: Nutch Suggestion? (Google like "did you mean")
Posted by Fredrik Andersson <fi...@gmail.com>.
Hi Jack!
I like these things to be driven by statistics rather than content of the
index. If you run a search engine, and want any kind of feedback, you will
at least save all queries entered. You can store these in an index or
database, and run a Levenshtein metric on the, potentially misspelled,
query. If my memory serves me right, a Lucene FuzzyQuery uses this metric,
so a good approach would be to keep a Lucene index with |query,frequency|
tuples (updated nightly, weekly, or whatever), and simply search this index
with a FuzzyQuery with some defined similarity, and pick the most frequent
query for suggestion.
Fredrik
On 9/29/05, Jack Tang <hi...@gmail.com> wrote:
>
> Hi
>
> I am very like Google's "Did you mean" and I notice that nutch now
> does not provider this function.
>
> In this article http://today.java.net/lpt/a/211, author Tim White
> implemented suggestion using n-gram to generate suggestion index. Do
> you think is it good for nutch? I mean index in nutch will be really
> huge. Or just provide some dictionaries like jazzy(LGPL) does?
>
> Thanks
> /Jack
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>
Re: Nutch Suggestion? (Google like "did you mean")
Posted by Piotr Kosiorowski <pk...@gmail.com>.
Have a look at http://issues.apache.org/jira/browse/NUTCH-48. I think ngram
based appeoach is appropriate here. I was using it in our search engine.
Regards
Piotr
On 9/29/05, Jack Tang <hi...@gmail.com> wrote:
>
> Hi
>
> I am very like Google's "Did you mean" and I notice that nutch now
> does not provider this function.
>
> In this article http://today.java.net/lpt/a/211, author Tim White
> implemented suggestion using n-gram to generate suggestion index. Do
> you think is it good for nutch? I mean index in nutch will be really
> huge. Or just provide some dictionaries like jazzy(LGPL) does?
>
> Thanks
> /Jack
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>