You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by og...@yahoo.com on 2005/10/15 17:34:11 UTC
Re: [Nutch-general] RE: New Nutch User
I believe that's incorrect. As a matter of fact, there were patches
for fixing character-encoding problems with this feature coming in
through JIRA just the other day.
Otis
--- Fuad Efendi <fu...@efendi.ca> wrote:
> Nutch does support A9's OpenSearch extensions to RSS.
>
> I think, it would be easier to start with pure Nutch, then to learn
> some
> JSP/Servlet... If you need own crawler...
>
>
> -----Original Message-----
> From: Webmaster@ExoticSportbike.com
> [mailto:Webmaster@ExoticSportbike.com]
> Sent: Thursday, October 13, 2005 1:33 PM
> To: nutch-user@lucene.apache.org
> Subject: RE: New Nutch User
>
>
> Thanks so much for your help. One more question I've never written a
> wrapper before. I did some searching online and found SWIG
> (http://www.swig.org) which seems like it can help me write a
> wrapper.
>
> Does anyone have some examples of a wrapper I can use, or will SWIG
> be my
> best bet? Ultimately my goal for Nutch is to create a site similar
> to
> Indeed.com. Any suggestions would be greatly appreciated. Thanks!
>
> -----Original Message-----
> From: Ngoc Giang Nguyen [mailto:giangnn@gmail.com]
> Sent: Wednesday, October 12, 2005 1:31 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: New Nutch User
>
> I think Nutch 0.7 supports OpenSearch protocol, so that you don't
> need to
> digest much on Java code. Just treat Nutch as a web service, and you
> can
> write wrapper on any scripting language that you love to handle HTTP
> requests/responses.
>
>
> On 10/13/05, Webmaster@exoticsportbike.com
> <We...@exoticsportbike.com>
> wrote:
> >
> > I am new to Nutch. I love it, but am not sure if I can handle
> putting
> > this together by myself. I run Red Hat Linux boxes with apache. I
> have
> > knowledge of HTML, some Java, MYSQL, PHP and Linux.
> >
> >
> >
> > Will I be able to get Nutch up and running to crawl multiple sites
> on
> > the internet the way a basic search engine does? What skills am I
> > missing or need to learn?
> >
> >
> >
> > My main problem, is that I am pretty confident that I can get Nutch
>
> > installed on my machines, but I'm not too sure how to integrate it
> > into the front end of my site. Is it just a simple POST or GET
> form,
> > or is it very JAVA intensive?
> >
> >
> >
> > Any suggestions would be greatly appreciated.
> >
> >
> >
> > Thanks
> >
> >
> >
>
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads,
> discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>
Re: how to build a SE based on nutch
Posted by Miguel A Paraz <mp...@gmail.com>.
On 10/18/05, Heart <be...@gmail.com> wrote:
> Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
> and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
> any advices would be greatly appreciated.
I'll add:
Sorry to ask this, but I could not find it in the docs. How could I
request Nutch to refetch sites that are already in the db? I tried
injecting them again, but they are not refreshed.
how to build a SE based on nutch
Posted by Heart <be...@gmail.com>.
I'm new to nutch. Several days ago, I finish building a simple intranet se based on nutch 0.6.
and I've spend two week to read the source code of nutch 0.6.
Now I want to build a bigger one. I want to crawl the pages from several website I specific.
My server is a poor machine with 1CPU 1G Mem and 320G HD, the bandwidth is 10Mbps
I want to provide a search service about some specific domain. so i choose some
big websites, and crawl them.
so my question is :
Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
any advices would be greatly appreciated.
--
Best regards,
Heart mailto:betogether@gmail.com