You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Marco Vanossi <ma...@gmail.com> on 2006/09/05 13:53:35 UTC

Caching the search results

Hi,

 Anybody knows how can I set Nutch to cache the results of the searches?
 I've heard about this feature but I am not finding the information....

Thanks,
Marco

RE: Caching the search results

Posted by Chirag Chaman <de...@filangy.com>.
Marco,

We use a search caching system at Filangy -- uses lucene to save the Search
string, count, date and top 20 IDs of the pages. So all you have to do is
search for those IDs.

Yes, it still involves a search, but we have a distributed system with the
ID as the hash key for specifying on which server to find the details of the
page making the parallel search more efficient. This search is about 60-75%
faster than a regular search.

You should be able to put a similar implementation together. I'm willing to
release this code to the open domain, PROVIDED, you or anyone else whose
interested changes it to make it generic and release as open-source to
other's in the nutch community.

CC-
--------------------------------------------
Chirag Chaman | Filangy, Inc.

-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: Tuesday, September 05, 2006 8:20 AM
To: nutch-user@lucene.apache.org
Subject: Re: Caching the search results

Marco Vanossi wrote:
> Hi,
>
> Anybody knows how can I set Nutch to cache the results of the searches?
> I've heard about this feature but I am not finding the information....

Trivial web-level caching is easy to implement - just download osCache and
modify your web application settings according to its documentation.

Smart caching on the level of indexes is more difficult to implement, and
Nutch doesn't include anything like that. You may find this paper of
interest:

    http://www2005.org/cdrom/docs/p257.pdf

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|
||  |  Embedded Unix, System Integration http://www.sigram.com  Contact:
info at sigram dot com





Re: Caching the search results

Posted by Andrzej Bialecki <ab...@getopt.org>.
Marco Vanossi wrote:
> Hi,
>
> Anybody knows how can I set Nutch to cache the results of the searches?
> I've heard about this feature but I am not finding the information....

Trivial web-level caching is easy to implement - just download osCache 
and modify your web application settings according to its documentation.

Smart caching on the level of indexes is more difficult to implement, 
and Nutch doesn't include anything like that. You may find this paper of 
interest:

    http://www2005.org/cdrom/docs/p257.pdf

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: [Nutch-general] Caching the search results

Posted by Ken Krugler <kk...@transpac.com>.
>You may want to consider using memcached - 
>http://www.danga.com/memcached/ - it's super simple and super 
>stable.  I use it over at Simpy.com and the memcached daemon there 
>has been up for months without showing any signs of trouble.

We've had good luck with ehcache  (http://ehcache.sourceforge.net).

-- Ken


>----- Original Message ----
>From: Marco Vanossi <ma...@gmail.com>
>To: nutch-user@lucene.apache.org
>Sent: Tuesday, September 5, 2006 7:53:35 AM
>Subject: [Nutch-general] Caching the search results
>
>Hi,
>
>  Anybody knows how can I set Nutch to cache the results of the searches?
>  I've heard about this feature but I am not finding the information....
>
>Thanks,
>Marco

-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Re: [Nutch-general] Caching the search results

Posted by og...@yahoo.com.
You may want to consider using memcached - http://www.danga.com/memcached/ - it's super simple and super stable.  I use it over at Simpy.com and the memcached daemon there has been up for months without showing any signs of trouble.

Otis

----- Original Message ----
From: Marco Vanossi <ma...@gmail.com>
To: nutch-user@lucene.apache.org
Sent: Tuesday, September 5, 2006 7:53:35 AM
Subject: [Nutch-general] Caching the search results

Hi,

 Anybody knows how can I set Nutch to cache the results of the searches?
 I've heard about this feature but I am not finding the information....

Thanks,
Marco

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general