You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jack Tang <hi...@gmail.com> on 2005/09/29 09:09:10 UTC

Nutch Suggestion? (Google like "did you mean")

Hi

I am very like Google's "Did you mean" and I notice that nutch now
does not provider this function.

In this article http://today.java.net/lpt/a/211, author Tim White
implemented suggestion using n-gram to generate suggestion index. Do
you think is it good for nutch? I mean index in nutch will be really
huge. Or just provide some dictionaries like jazzy(LGPL) does?

Thanks
/Jack
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

refetching interval

Posted by Michael Ji <fj...@yahoo.com>.
Hi,

I am using nutch 07 and found the following code in
FetchListTool.java

private static final long FETCH_GENERATION_DELAY_MS =
7 * 24 * 60 * 60 * 1000;

that means next refetching time is always 7 days
later, no matter what fetch interval setting in
nutch-site.xml,

I feel puzzled. Could any one give me a hint? 

thanks,

Michael,


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Nutch Suggestion? (Google like "did you mean")

Posted by Jack Tang <hi...@gmail.com>.
Hi Fredrik

Thanks for your posting. I appreciate the way you did, really every nice;)

Regards
/Jack

On 9/29/05, Fredrik Andersson <fi...@gmail.com> wrote:
> Hi Jack!
>
>  I like these things to be driven by statistics rather than content of the
> index. If you run a search engine, and want any kind of feedback, you will
> at least save all queries entered. You can store these in an index or
> database, and run a Levenshtein metric on the, potentially misspelled,
> query. If my memory serves me right, a Lucene FuzzyQuery uses this metric,
> so a good approach would be to keep a Lucene index with |query,frequency|
> tuples (updated nightly, weekly, or whatever), and simply search this index
> with a FuzzyQuery with some defined similarity, and pick the most frequent
> query for suggestion.
>
>  Fredrik
>
> On 9/29/05, Jack Tang <hi...@gmail.com> wrote:
> > Hi
> >
> > I am very like Google's "Did you mean" and I notice that nutch now
> > does not provider this function.
> >
> > In this article http://today.java.net/lpt/a/211 , author Tim White
> > implemented suggestion using n-gram to generate suggestion index. Do
> > you think is it good for nutch? I mean index in nutch will be really
> > huge. Or just provide some dictionaries like jazzy(LGPL) does?
> >
> > Thanks
> > /Jack
> > --
> > Keep Discovering ... ...
> > http://www.jroller.com/page/jmars
> >
>
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Re: Nutch Suggestion? (Google like "did you mean")

Posted by Fredrik Andersson <fi...@gmail.com>.
Hi Jack!

I like these things to be driven by statistics rather than content of the
index. If you run a search engine, and want any kind of feedback, you will
at least save all queries entered. You can store these in an index or
database, and run a Levenshtein metric on the, potentially misspelled,
query. If my memory serves me right, a Lucene FuzzyQuery uses this metric,
so a good approach would be to keep a Lucene index with |query,frequency|
tuples (updated nightly, weekly, or whatever), and simply search this index
with a FuzzyQuery with some defined similarity, and pick the most frequent
query for suggestion.

Fredrik

On 9/29/05, Jack Tang <hi...@gmail.com> wrote:
>
> Hi
>
> I am very like Google's "Did you mean" and I notice that nutch now
> does not provider this function.
>
> In this article http://today.java.net/lpt/a/211, author Tim White
> implemented suggestion using n-gram to generate suggestion index. Do
> you think is it good for nutch? I mean index in nutch will be really
> huge. Or just provide some dictionaries like jazzy(LGPL) does?
>
> Thanks
> /Jack
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>

Re: Nutch Suggestion? (Google like "did you mean")

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Have a look at http://issues.apache.org/jira/browse/NUTCH-48. I think ngram
based appeoach is appropriate here. I was using it in our search engine.
Regards
Piotr

On 9/29/05, Jack Tang <hi...@gmail.com> wrote:
>
> Hi
>
> I am very like Google's "Did you mean" and I notice that nutch now
> does not provider this function.
>
> In this article http://today.java.net/lpt/a/211, author Tim White
> implemented suggestion using n-gram to generate suggestion index. Do
> you think is it good for nutch? I mean index in nutch will be really
> huge. Or just provide some dictionaries like jazzy(LGPL) does?
>
> Thanks
> /Jack
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>