You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by jy...@yahoo.com on 2010/01/13 04:51:59 UTC

how to follow intranet: configuration in nutch website

Hi,

I try to following the instruction from http://lucene.apache.org/nutch/tutorial8.html
.....
Intranet: Configuration
To configure things for intranet crawling you must:1. Create a directory with a flat file of root urls.  For example, to
crawl the nutch site you might start with a file named
urls/nutch containing the url of just the Nutch home
page.  All other Nutch pages should be reachable from this page.  The
urls/nutch file would thus contain:
http://lucene.apache.org/nutch/

....

not understand. Can anyone help me out. 

Thanks.
zhou


      New Email addresses available on Yahoo!
Get the Email name you've always wanted on the new @ymail and @rocketmail. 
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/

Re: how to follow intranet: configuration in nutch website

Posted by jy...@yahoo.com.
Thanks.

--- On Wed, 13/1/10, Otis Gospodnetic <ot...@yahoo.com> wrote:

From: Otis Gospodnetic <ot...@yahoo.com>
Subject: Re: how to follow intranet: configuration in nutch website
To: java-user@lucene.apache.org
Date: Wednesday, 13 January, 2010, 12:07 PM

Zhou,

Your question will get more attention if you send it to nutch-user@lucene.apache.org list instead.  This list is for Lucene Java.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: "jyzhou817@yahoo.com" <jy...@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:51:59 PM
> Subject: how to follow intranet: configuration in nutch website
> 
> Hi,
> 
> I try to following the instruction from 
> http://lucene.apache.org/nutch/tutorial8.html
> .....
> Intranet: Configuration
> To configure things for intranet crawling you must:1. Create a directory with a 
> flat file of root urls.  For example, to
> crawl the nutch site you might start with a file named
> urls/nutch containing the url of just the Nutch home
> page.  All other Nutch pages should be reachable from this page.  The
> urls/nutch file would thus contain:
> http://lucene.apache.org/nutch/
> 
> ....
> 
> not understand. Can anyone help me out. 
> 
> Thanks.
> zhou
> 
> 
>       New Email addresses available on Yahoo!
> Get the Email name you've always wanted on the new @ymail and @rocketmail. 
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




      

Re: how to follow intranet: configuration in nutch website

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Zhou,

Your question will get more attention if you send it to nutch-user@lucene.apache.org list instead.  This list is for Lucene Java.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: "jyzhou817@yahoo.com" <jy...@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Tue, January 12, 2010 10:51:59 PM
> Subject: how to follow intranet: configuration in nutch website
> 
> Hi,
> 
> I try to following the instruction from 
> http://lucene.apache.org/nutch/tutorial8.html
> .....
> Intranet: Configuration
> To configure things for intranet crawling you must:1. Create a directory with a 
> flat file of root urls.  For example, to
> crawl the nutch site you might start with a file named
> urls/nutch containing the url of just the Nutch home
> page.  All other Nutch pages should be reachable from this page.  The
> urls/nutch file would thus contain:
> http://lucene.apache.org/nutch/
> 
> ....
> 
> not understand. Can anyone help me out. 
> 
> Thanks.
> zhou
> 
> 
>       New Email addresses available on Yahoo!
> Get the Email name you've always wanted on the new @ymail and @rocketmail. 
> Hurry before someone else does!
> http://mail.promotions.yahoo.com/newdomains/sg/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org