You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jqq <re...@gmail.com> on 2009/04/27 14:58:42 UTC

Searching multiple indexes with Nutch-2 servers,0 segments

Hi,
I have two computers, which are PC1 and PC2.
        PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
        PC2:windows xp,cygwin,IP 10.0.0.3
The PC1 and PC2 have crawled different data.
For searching multiple indexes,my configuration is as follows:
1. configure /conf/slaves file on the both computers,the file contains the
flollowing:
                   10.0.0.2
                   10.0.0.3
2.I created a file called search-servers.txt
           PC1: c:\nutch\servers\search-servers.txt
           PC2: c:\nutch\servers\search-servers.txt
  This file contains the following(host,port):
          10.0.0.2 9988
          10.0.0.3 9988
3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set the
searcher.dir property,so the property is:
              <name>searcher.dir</name>
              <value>c:\nutch\servers</value>
4.start the search servers by typing:
             PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
             PC2: ./bin/nutch server 9988 /cygdrive/e/crawl

start tomcat and go to:http://10.0.0.2/nutch/,my search results is 0
hits.tomcat's log is:
 DistributedSearch- Querying segments from search servers...
 DistributedSearch- STATS:2 servers,0 segments.

Why is 0 segments?
Thanks.

Re: Searching multiple indexes with Nutch-2 servers,0 segments

Posted by jqq <re...@gmail.com>.
Thank you very much! According to your method, the problem has been
resolved.

2009/5/4 Andrzej Bialecki <ab...@getopt.org>

> jqq wrote:
>
>> Thanks. But I did not make any configuration in hadoop-site.xml. In
>> addition,  i set fs.default.name <http://fs.default.name> property:
>>         <name>fs.default.name <http://fs.default.name></name>
>>          <value>local</value>
>>
>> started the search servers and tomat, I still got any search results,
>> tomcat's log is:
>>  DistributedSearch- Querying segments from search servers...
>>  DistributedSearch- STATS:*2 servers,0 segments*.
>>
>
> This indicates that the problem is on the server side, i.e. servers can't
> find any usable segments. Ah, I think I know what's going on - only now I
> spotted the /cygdrive path in your example. Please use the Windows names for
> these paths, i.e. use "e:\crawl" (in quotes, or use double backslashes to
> escape single backslashes).
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Searching multiple indexes with Nutch-2 servers,0 segments

Posted by Andrzej Bialecki <ab...@getopt.org>.
jqq wrote:
> Thanks. But I did not make any configuration in hadoop-site.xml. In 
> addition,  i set fs.default.name <http://fs.default.name> property:
>          <name>fs.default.name <http://fs.default.name></name>
>           <value>local</value>
> 
> started the search servers and tomat, I still got any search results, 
> tomcat's log is:
>  DistributedSearch- Querying segments from search servers...
>  DistributedSearch- STATS:*2 servers,0 segments*.

This indicates that the problem is on the server side, i.e. servers 
can't find any usable segments. Ah, I think I know what's going on - 
only now I spotted the /cygdrive path in your example. Please use the 
Windows names for these paths, i.e. use "e:\crawl" (in quotes, or use 
double backslashes to escape single backslashes).


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Searching multiple indexes with Nutch-2 servers,0 segments

Posted by jqq <re...@gmail.com>.
Thanks. But I did not make any configuration in hadoop-site.xml. In
addition,  i set fs.default.name property:
         <name>fs.default.name</name>
          <value>local</value>

started the search servers and tomat, I still got any search results,
tomcat's log is:
 DistributedSearch- Querying segments from search servers...
 DistributedSearch- STATS:*2 servers,0 segments*.
2009/5/4 Andrzej Bialecki <ab...@getopt.org>

> jqq wrote:
>
>> Hi,
>> I have two computers, which are PC1 and PC2.
>>        PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
>>        PC2:windows xp,cygwin,IP 10.0.0.3
>> The PC1 and PC2 have crawled different data.
>> For searching multiple indexes,my configuration is as follows:
>> 1. configure /conf/slaves file on the both computers,the file contains the
>> flollowing:
>>                   10.0.0.2
>>                   10.0.0.3
>>
>
> conf/slaves doesn't configure the searching - it's only needed when
> starting / stopping a map-reduce cluster.
>
> 2.I created a file called search-servers.txt
>>           PC1: c:\nutch\servers\search-servers.txt
>>           PC2: c:\nutch\servers\search-servers.txt
>>  This file contains the following(host,port):
>>          10.0.0.2 9988
>>          10.0.0.3 9988
>> 3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set the
>> searcher.dir property,so the property is:
>>              <name>searcher.dir</name>
>>              <value>c:\nutch\servers</value>
>>
>
> In this directory you should put a file search-servers.txt that contains:
>
> 10.0.0.2 9988
> 10.0.0.3 9988
>
>  4.start the search servers by typing:
>>             PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
>>             PC2: ./bin/nutch server 9988 /cygdrive/e/crawl
>>
>> start tomcat and go to:http://10.0.0.2/nutch/ , my search results is 0
>> hits.tomcat's log is:
>>  DistributedSearch- Querying segments from search servers...
>>  DistributedSearch- STATS:2 servers,0 segments.
>>
>> Why are 0 segments?
>>
>
> Common mistake is also to use hadoop-site.xml that configures Hadoop FS
> layer to use the distributed filesystem (DFS), while the data is located on
> the local filesystem.
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Searching multiple indexes with Nutch-2 servers,0 segments

Posted by Andrzej Bialecki <ab...@getopt.org>.
jqq wrote:
> Hi,
> I have two computers, which are PC1 and PC2.
>         PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
>         PC2:windows xp,cygwin,IP 10.0.0.3
> The PC1 and PC2 have crawled different data.
> For searching multiple indexes,my configuration is as follows:
> 1. configure /conf/slaves file on the both computers,the file contains 
> the flollowing:
>                    10.0.0.2
>                    10.0.0.3

conf/slaves doesn't configure the searching - it's only needed when 
starting / stopping a map-reduce cluster.

> 2.I created a file called search-servers.txt
>            PC1: c:\nutch\servers\search-servers.txt
>            PC2: c:\nutch\servers\search-servers.txt
>   This file contains the following(host,port):
>           10.0.0.2 9988
>           10.0.0.3 9988
> 3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set 
> the searcher.dir property,so the property is:
>               <name>searcher.dir</name>
>               <value>c:\nutch\servers</value>

In this directory you should put a file search-servers.txt that contains:

10.0.0.2 9988
10.0.0.3 9988

> 4.start the search servers by typing:
>              PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
>              PC2: ./bin/nutch server 9988 /cygdrive/e/crawl
> 
> start tomcat and go to:http://10.0.0.2/nutch/ , my search results is 0 
> hits.tomcat's log is:
>  DistributedSearch- Querying segments from search servers...
>  DistributedSearch- STATS:2 servers,0 segments.
> 
> Why are 0 segments?

Common mistake is also to use hadoop-site.xml that configures Hadoop FS 
layer to use the distributed filesystem (DFS), while the data is located 
on the local filesystem.



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Searching multiple indexes with Nutch-2 servers,0 segments

Posted by jqq <re...@gmail.com>.
 Hi,
I have two computers, which are PC1 and PC2.
        PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
        PC2:windows xp,cygwin,IP 10.0.0.3
The PC1 and PC2 have crawled different data.
For searching multiple indexes,my configuration is as follows:
1. configure /conf/slaves file on the both computers,the file contains the
flollowing:
                   10.0.0.2
                   10.0.0.3
2.I created a file called search-servers.txt
           PC1: c:\nutch\servers\search-servers.txt
           PC2: c:\nutch\servers\search-servers.txt
  This file contains the following(host,port):
          10.0.0.2 9988
          10.0.0.3 9988
3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set the
searcher.dir property,so the property is:
              <name>searcher.dir</name>
              <value>c:\nutch\servers</value>
4.start the search servers by typing:
             PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
             PC2: ./bin/nutch server 9988 /cygdrive/e/crawl

start tomcat and go to:http://10.0.0.2/nutch/ , my search results is 0
hits.tomcat's log is:
 DistributedSearch- Querying segments from search servers...
 DistributedSearch- STATS:2 servers,0 segments.

Why are 0 segments?
Thanks.