You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ili chimad <in...@yahoo.fr> on 2008/05/01 11:09:18 UTC

nutch 0.9 "no results" ??

Hi, i'm using "nutch 0.9" with "tomcat6" / Windows-Vista+cygwin for 2days only

before sending this mail i read many posts here but i didn't find this problem,
after finishing the "crawl" step and deploy nutch project i get "no results" 0-0 result ?
what ths it mean?
with bin/nutch crawl -dir crawl -depth 3 -topN 30 ==>
crawl directory size= 1,60 Mo
i copy/paste the file config from nutch tutorial 0.9?
please any suggestion :(

THANKS !!

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

RE: nutch 0.9 "no results" ??

Posted by Bill Meltzer <BM...@taunton.com>.
I run under Windows and use:
    <name>searcher.dir</name>
    <value>C:\\nutch-0.9\\crawl</value>

Notice the double back-slashes which provides the escape needed for
Windows style paths

Bill.

-----Original Message-----
From: Susam Pal [mailto:susam.pal@gmail.com] 
Sent: Thursday, May 01, 2008 11:10 AM
To: nutch-user@lucene.apache.org; inf_lmd1@yahoo.fr
Subject: Re: nutch 0.9 "no results" ??

On Thu, May 1, 2008 at 3:17 PM, ili chimad <in...@yahoo.fr> wrote:
> Thnks S.P for your quik response
>
>
>  > 1. Check logs/hadoop.log file. Do you see any lines
>  > containing the
>  > string "fetching". Such lines should clearly show
>  > what URLs have been
>  > fetched.
>
>  there are many fetchinf line there, i think it's not for this reason.
>
>
>  > 2. One reason may be that all URLs are blocked in
>  > conf/crawl-urlfilter.txt. Did you edit this file as per the
>  > tutorial?
>  > If not, this is most certainly the problem. An easy way to
>  > allow all
>  > URLs would be to replace the .- in the end with .+
>  >
>
>  yes, like this:
>  # accept hosts in MY.DOMAIN.NAME
>  +^http://([a-z0-9]*\.)*hustoo.net/
>  # skip everything else
>  +.
>
>  what do you think about Tomcat
6.0\webapps\nutch-0.9\WEB-INF\classesnutch-site.xml:
>  <configuration>
>  <property>
>
>     <name>searcher.dir</name>
>
>     <value>C:\nutch-0.9\crawl\</value>
>
>   </property>
>  </configuration>
>
>  in ths first i think it's a "\" problem or the path in generally ??
>
>  THANKS for any suggestion..

That can be the reason. I haven't used Nutch on Windows, so I don't
know about the kind of issues one might face on Windows. The default
value for this property is 'crawl'. You can try removing this property
from nutch-site.xml so that the default value from nutch-default.xml
is used. Then change your current directory to the directory that
contains the 'crawl' directory and restart Nutch. If it works, then
most certainly, the absolute path you have given is causing the
problem. You could then try something like C:/nutch-0.9/crawl/ and see
if it works. By the way, did you try searching from command prompt
using the bin/nutch crawl command. That will ensure that your index is
correct and provides results.

Regards,
Susam Pal

Re: nutch 0.9 "no results" ??

Posted by Susam Pal <su...@gmail.com>.
On Thu, May 1, 2008 at 3:17 PM, ili chimad <in...@yahoo.fr> wrote:
> Thnks S.P for your quik response
>
>
>  > 1. Check logs/hadoop.log file. Do you see any lines
>  > containing the
>  > string "fetching". Such lines should clearly show
>  > what URLs have been
>  > fetched.
>
>  there are many fetchinf line there, i think it's not for this reason.
>
>
>  > 2. One reason may be that all URLs are blocked in
>  > conf/crawl-urlfilter.txt. Did you edit this file as per the
>  > tutorial?
>  > If not, this is most certainly the problem. An easy way to
>  > allow all
>  > URLs would be to replace the .- in the end with .+
>  >
>
>  yes, like this:
>  # accept hosts in MY.DOMAIN.NAME
>  +^http://([a-z0-9]*\.)*hustoo.net/
>  # skip everything else
>  +.
>
>  what do you think about Tomcat 6.0\webapps\nutch-0.9\WEB-INF\classesnutch-site.xml:
>  <configuration>
>  <property>
>
>     <name>searcher.dir</name>
>
>     <value>C:\nutch-0.9\crawl\</value>
>
>   </property>
>  </configuration>
>
>  in ths first i think it's a "\" problem or the path in generally ??
>
>  THANKS for any suggestion..

That can be the reason. I haven't used Nutch on Windows, so I don't
know about the kind of issues one might face on Windows. The default
value for this property is 'crawl'. You can try removing this property
from nutch-site.xml so that the default value from nutch-default.xml
is used. Then change your current directory to the directory that
contains the 'crawl' directory and restart Nutch. If it works, then
most certainly, the absolute path you have given is causing the
problem. You could then try something like C:/nutch-0.9/crawl/ and see
if it works. By the way, did you try searching from command prompt
using the bin/nutch crawl command. That will ensure that your index is
correct and provides results.

Regards,
Susam Pal

Re: nutch 0.9 "no results" ??

Posted by ili chimad <in...@yahoo.fr>.
Thnks S.P for your quik response

> 1. Check logs/hadoop.log file. Do you see any lines
> containing the
> string "fetching". Such lines should clearly show
> what URLs have been
> fetched.

there are many fetchinf line there, i think it's not for this reason.

> 2. One reason may be that all URLs are blocked in
> conf/crawl-urlfilter.txt. Did you edit this file as per the
> tutorial?
> If not, this is most certainly the problem. An easy way to
> allow all
> URLs would be to replace the .- in the end with .+
> 

yes, like this: 
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*hustoo.net/
# skip everything else
+.

what do you think about Tomcat 6.0\webapps\nutch-0.9\WEB-INF\classesnutch-site.xml:
<configuration>
<property>

    <name>searcher.dir</name>

    <value>C:\nutch-0.9\crawl\</value>

  </property>
</configuration>

in ths first i think it's a "\" problem or the path in generally ??

THANKS for any suggestion..

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Re: nutch 0.9 "no results" ??

Posted by Susam Pal <su...@gmail.com>.
You can check a couple of things to troubleshoot this.

1. Check logs/hadoop.log file. Do you see any lines containing the
string "fetching". Such lines should clearly show what URLs have been
fetched. If such lines are not present, it means your crawl did not
fetch anything for some reason. Also, read this log file carefully.
You might find clues about the problem.
2. One reason may be that all URLs are blocked in
conf/crawl-urlfilter.txt. Did you edit this file as per the tutorial?
If not, this is most certainly the problem. An easy way to allow all
URLs would be to replace the .- in the end with .+

Regards,
Susam Pal

On Thu, May 1, 2008 at 2:39 PM, ili chimad <in...@yahoo.fr> wrote:
> Hi, i'm using "nutch 0.9" with "tomcat6" / Windows-Vista+cygwin for 2days only
>
>  before sending this mail i read many posts here but i didn't find this problem,
>  after finishing the "crawl" step and deploy nutch project i get "no results" 0-0 result ?
>  what ths it mean?
>  with bin/nutch crawl -dir crawl -depth 3 -topN 30 ==>
>  crawl directory size= 1,60 Mo
>  i copy/paste the file config from nutch tutorial 0.9?
>  please any suggestion :(
>
>  THANKS !!
>
>
>  __________________________________________________
>  Do You Yahoo!?
>  En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités
>  http://mail.yahoo.fr Yahoo! Mail
>

Re: nutch 0.9 "no results" ??

Posted by ili chimad <in...@yahoo.fr>.
HI, that's OK I have my results now :)
 after reinstalling tomcat under c:\Tomcat6 (vista don't save xml config files under /Program Files/* if you are not admin ) 
and i use "\\" in nutch-site.xml for crawl directory path.

thanks for all 
, specially to:
Xue Yong Zhi, Bill Meltzer, Susam Pal :)

------------
Ili Chimad

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Re: UI nutch 0.9?

Posted by ili chimad <in...@yahoo.fr>.
Thanks Lukas,
but what's the difference between indexing local file system with Nutch and Solr?
is Solr more performant (speed and results) than Nutch ??
and for bubble popup i'm reading more about JSON and how can i inetgrate it!!

---------------
C.Ili


--- En date de : Sam 3.5.08, lukas schweizer <li...@lukas-schweizer.de> a écrit :

> De: lukas schweizer <li...@lukas-schweizer.de>
> Objet: Re: UI nutch 0.9?
> À: nutch-user@lucene.apache.org
> Date: Samedi 3 Mai 2008, 13h50
> ili chimad schrieb:
> > Hi, 
> >
> > 1) I would like to integrate nutch into my website to
> index different documents(pdf, doc, txt ..) in intranet, 
> >     and adding a bubble popup to some keywords in
> results page (with javascript),
> >     so I ask if it's possible to do this with
> modifying search.jsp ??
> >     
> > 2) the main goal in my project is to give customized
> search engine, but only some privilege user can
> >    add/delete/update documents in database from
> another interface,
> >    so I ask you a language/technology to automate 
> different command; bin/nutch crawl ...., update ..., delete
> ??
> >    
> >    any idea, suggestion welcome 8o)
> >    
> > Ili CHIMAD
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > En finir avec le spam? Yahoo! Mail vous offre la
> meilleure protection possible contre les messages non
> sollicités 
> > http://mail.yahoo.fr Yahoo! Mail 
> >
> >   
> Hi Ili,
> 
> to handle documents of different formats easily, and
> additionally offer 
> search functionality on your website ... I think ... Solr
> would be the 
> better choice, moreover Solr offers sophisticated
> replication techniques 
> for your Insert/Update/Delete.
> search results can be retrieved as Jason String ... useful
> for your 
> bubble popup using java script.
> 
> Think about it ;-)
> 
> Lukas

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Re: UI nutch 0.9?

Posted by lukas schweizer <li...@lukas-schweizer.de>.
ili chimad schrieb:
> Hi, 
>
> 1) I would like to integrate nutch into my website to index different documents(pdf, doc, txt ..) in intranet, 
>     and adding a bubble popup to some keywords in results page (with javascript),
>     so I ask if it's possible to do this with modifying search.jsp ??
>     
> 2) the main goal in my project is to give customized search engine, but only some privilege user can
>    add/delete/update documents in database from another interface,
>    so I ask you a language/technology to automate  different command; bin/nutch crawl ...., update ..., delete ??
>    
>    any idea, suggestion welcome 8o)
>    
> Ili CHIMAD
>
>
> __________________________________________________
> Do You Yahoo!?
> En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
> http://mail.yahoo.fr Yahoo! Mail 
>
>   
Hi Ili,

to handle documents of different formats easily, and additionally offer 
search functionality on your website ... I think ... Solr would be the 
better choice, moreover Solr offers sophisticated replication techniques 
for your Insert/Update/Delete.
search results can be retrieved as Jason String ... useful for your 
bubble popup using java script.

Think about it ;-)

Lukas

UI nutch 0.9?

Posted by ili chimad <in...@yahoo.fr>.
Hi, 

1) I would like to integrate nutch into my website to index different documents(pdf, doc, txt ..) in intranet, 
    and adding a bubble popup to some keywords in results page (with javascript),
    so I ask if it's possible to do this with modifying search.jsp ??
    
2) the main goal in my project is to give customized search engine, but only some privilege user can
   add/delete/update documents in database from another interface,
   so I ask you a language/technology to automate  different command; bin/nutch crawl ...., update ..., delete ??
   
   any idea, suggestion welcome 8o)
   
Ili CHIMAD


__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Re: nutch 0.9 "no results" ??

Posted by Xue Yong Zhi <xu...@gmail.com>.
The problem may caused by your Vista:

As Tomcat is usually installed under 'Program Files', when editing
'WEB-INF\classes\nutch- site.xml', the user may ends up editing a file in
VirtualStore.

Be sure to edit files in  'Program Files' folder as 'Administrator'.

Yong
http://seclib.blogspot.com

On 5/1/08, ili chimad <in...@yahoo.fr> wrote:
>
> Hi, i'm using "nutch 0.9" with "tomcat6" / Windows-Vista+cygwin for 2days
> only
>
> before sending this mail i read many posts here but i didn't find this
> problem,
> after finishing the "crawl" step and deploy nutch project i get "no
> results" 0-0 result ?
> what ths it mean?
> with bin/nutch crawl -dir crawl -depth 3 -topN 30 ==>
> crawl directory size= 1,60 Mo
> i copy/paste the file config from nutch tutorial 0.9?
> please any suggestion :(
>
> THANKS !!
>
>