You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by jy...@yahoo.com on 2010/01/13 04:51:59 UTC
how to follow intranet: configuration in nutch website
Hi,
I try to following the instruction from http://lucene.apache.org/nutch/tutorial8.html
.....
Intranet: Configuration
To configure things for intranet crawling you must:1. Create a directory with a flat file of root urls. For example, to
crawl the nutch site you might start with a file named
urls/nutch containing the url of just the Nutch home
page. All other Nutch pages should be reachable from this page. The
urls/nutch file would thus contain:
http://lucene.apache.org/nutch/
....
not understand. Can anyone help me out.
Thanks.
zhou
New Email addresses available on Yahoo!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/
Re: how to follow intranet: configuration in nutch website
Posted by jy...@yahoo.com.
Thanks.
--- On Wed, 13/1/10, Otis Gospodnetic <ot...@yahoo.com> wrote:
From: Otis Gospodnetic <ot...@yahoo.com>
Subject: Re: how to follow intranet: configuration in nutch website
To: java-user@lucene.apache.org
Date: Wednesday, 13 January, 2010, 12:07 PM
Zhou,
Your question will get more attention if you send it to nutch-user@lucene.apache.org list instead. This list is for Lucene Java.
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
----- Original Message ----
> From: "jyzhou817@yahoo.com" <jy...@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:51:59 PM
> Subject: how to follow intranet: configuration in nutch website
>
> Hi,
>
> I try to following the instruction from
> http://lucene.apache.org/nutch/tutorial8.html
> .....
> Intranet: Configuration
> To configure things for intranet crawling you must:1. Create a directory with a
> flat file of root urls. For example, to
> crawl the nutch site you might start with a file named
> urls/nutch containing the url of just the Nutch home
> page. All other Nutch pages should be reachable from this page. The
> urls/nutch file would thus contain:
> http://lucene.apache.org/nutch/
>
> ....
>
> not understand. Can anyone help me out.
>
> Thanks.
> zhou
>
>
> New Email addresses available on Yahoo!
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: how to follow intranet: configuration in nutch website
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Zhou,
Your question will get more attention if you send it to nutch-user@lucene.apache.org list instead. This list is for Lucene Java.
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
----- Original Message ----
> From: "jyzhou817@yahoo.com" <jy...@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:51:59 PM
> Subject: how to follow intranet: configuration in nutch website
>
> Hi,
>
> I try to following the instruction from
> http://lucene.apache.org/nutch/tutorial8.html
> .....
> Intranet: Configuration
> To configure things for intranet crawling you must:1. Create a directory with a
> flat file of root urls. For example, to
> crawl the nutch site you might start with a file named
> urls/nutch containing the url of just the Nutch home
> page. All other Nutch pages should be reachable from this page. The
> urls/nutch file would thus contain:
> http://lucene.apache.org/nutch/
>
> ....
>
> not understand. Can anyone help me out.
>
> Thanks.
> zhou
>
>
> New Email addresses available on Yahoo!
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org