You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by og...@yahoo.com on 2005/10/15 17:34:11 UTC

Re: [Nutch-general] RE: New Nutch User

I believe that's incorrect.  As a matter of fact, there were patches
for fixing character-encoding problems with this feature coming in
through JIRA just the other day.

Otis


--- Fuad Efendi <fu...@efendi.ca> wrote:

> Nutch does support A9's OpenSearch extensions to RSS.
> 
> I think, it would be easier to start with pure Nutch, then to learn
> some
> JSP/Servlet... If you need own crawler...
> 
> 
> -----Original Message-----
> From: Webmaster@ExoticSportbike.com
> [mailto:Webmaster@ExoticSportbike.com] 
> Sent: Thursday, October 13, 2005 1:33 PM
> To: nutch-user@lucene.apache.org
> Subject: RE: New Nutch User
> 
> 
> Thanks so much for your help.  One more question I've never written a
> wrapper before.  I did some searching online and found SWIG
> (http://www.swig.org) which seems like it can help me write a
> wrapper.
> 
> Does anyone have some examples of a wrapper I can use, or will SWIG
> be my
> best bet?  Ultimately my goal for Nutch is to create a site similar
> to
> Indeed.com.  Any suggestions would be greatly appreciated.  Thanks!
> 
> -----Original Message-----
> From: Ngoc Giang Nguyen [mailto:giangnn@gmail.com] 
> Sent: Wednesday, October 12, 2005 1:31 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: New Nutch User
> 
> I think Nutch 0.7 supports OpenSearch protocol, so that you don't
> need to
> digest much on Java code. Just treat Nutch as a web service, and you
> can
> write wrapper on any scripting language that you love to handle HTTP
> requests/responses.
> 
> 
> On 10/13/05, Webmaster@exoticsportbike.com
> <We...@exoticsportbike.com>
> wrote:
> >
> > I am new to Nutch. I love it, but am not sure if I can handle
> putting 
> > this together by myself. I run Red Hat Linux boxes with apache. I
> have 
> > knowledge of HTML, some Java, MYSQL, PHP and Linux.
> >
> >
> >
> > Will I be able to get Nutch up and running to crawl multiple sites
> on 
> > the internet the way a basic search engine does? What skills am I 
> > missing or need to learn?
> >
> >
> >
> > My main problem, is that I am pretty confident that I can get Nutch
> 
> > installed on my machines, but I'm not too sure how to integrate it 
> > into the front end of my site. Is it just a simple POST or GET
> form, 
> > or is it very JAVA intensive?
> >
> >
> >
> > Any suggestions would be greatly appreciated.
> >
> >
> >
> > Thanks
> >
> >
> >
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads,
> discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> 


Re: how to build a SE based on nutch

Posted by Miguel A Paraz <mp...@gmail.com>.
On 10/18/05, Heart <be...@gmail.com> wrote:
> Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
> and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
> any advices would be greatly appreciated.

I'll add:
Sorry to ask this, but I could not find it in the docs. How could I
request Nutch to refetch sites that are already in the db? I tried
injecting them again, but they are not refreshed.

how to build a SE based on nutch

Posted by Heart <be...@gmail.com>.
I'm new to nutch. Several days ago, I finish building a simple intranet se based on nutch 0.6.
and I've spend two week to read the source code of nutch 0.6.

Now I want to build a bigger one. I want to crawl the pages from several website I specific.
My server is a poor machine with 1CPU 1G Mem and 320G HD, the bandwidth is 10Mbps
I want to provide a search service about some specific domain. so i choose some 
big websites, and crawl them. 
so my question is :
Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
any advices would be greatly appreciated.



-- 
Best regards,
 Heart                            mailto:betogether@gmail.com