You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "Ratnesh,V2Solutions India" <ra...@in.v2solutions.com> on 2007/03/17 10:49:04 UTC

Crawling sucessful without fetching

HI,
I am trying to run nutch-0.8.1 souce code in eclipse, in eclipse i am
selecting new ->project->java project->existing from build file. and it
compiles all the source and includes it in my workspace for eg.
E:/workspace.
I have included conf directory in as souce file, copied plugins and lib
directory and attached with my project file.

parallaly did settings in nutch-default.xml for agentname, and robots and
nutch-site.xml for searcher.dir. and in crawl-urlfilter.txt. Made a urltest
directory in my workspace containing seed of url as 

http://localhost:8080/nutch-0.8.1/RATNESH/index.html.

in the tomcat settings , I copied nutch-0.8.1.war , nutch-0.8.1 and pasted
it inside the folder webapps. and inside the WEB-INF/classes i did changing
in file nutch-site.xml for searcher.dir.

start the tomcat server.

and tried for crawl by giving command in eclipse as
crawl -d urltest -dir crawl-result -depth 1 -topN 3

what happens is it does not show any error and in the console window , I
find as crawl successfully finished.

but it does not fetch my html pages stored inside the
webapps/nutch-0.8.1/RATNESH folder of tomcat.

so please help me where I am getting wrong???

Looking for your valuable inputs

Thanks
Ratnesh V2Solutions, India
-- 
View this message in context: http://www.nabble.com/Crawling-sucessful-without-fetching-tf3418547.html#a9527809
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Crawling sucessful without fetching

Posted by Rajneesh Makhija <ra...@in.v2solutions.com>.

Hi,
I don't think the info you have provided is adequate to identify the issue.
Does your log file log any errors? Look in the hadoop.log that can be found
in the logs directory.



Ratnesh,V2Solutions India wrote:
> 
> HI,
> I am trying to run nutch-0.8.1 souce code in eclipse, in eclipse i am
> selecting new ->project->java project->existing from build file. and it
> compiles all the source and includes it in my workspace for eg.
> E:/workspace.
> I have included conf directory in as souce file, copied plugins and lib
> directory and attached with my project file.
> 
> parallaly did settings in nutch-default.xml for agentname, and robots and
> nutch-site.xml for searcher.dir. and in crawl-urlfilter.txt. Made a
> urltest directory in my workspace containing seed of url as 
> 
> http://localhost:8080/nutch-0.8.1/RATNESH/index.html.
> 
> in the tomcat settings , I copied nutch-0.8.1.war , nutch-0.8.1 and pasted
> it inside the folder webapps. and inside the WEB-INF/classes i did
> changing in file nutch-site.xml for searcher.dir.
> 
> start the tomcat server.
> 
> and tried for crawl by giving command in eclipse as
> crawl -d urltest -dir crawl-result -depth 1 -topN 3
> 
> what happens is it does not show any error and in the console window , I
> find as crawl successfully finished.
> 
> but it does not fetch my html pages stored inside the
> webapps/nutch-0.8.1/RATNESH folder of tomcat.
> 
> so please help me where I am getting wrong???
> 
> Looking for your valuable inputs
> 
> Thanks
> Ratnesh V2Solutions, India
> 

-- 
View this message in context: http://www.nabble.com/Crawling-sucessful-without-fetching-tf3418547.html#a9531864
Sent from the Nutch - User mailing list archive at Nabble.com.