You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jqq <re...@gmail.com> on 2009/04/27 14:58:42 UTC
Searching multiple indexes with Nutch-2 servers,0 segments
Hi,
I have two computers, which are PC1 and PC2.
PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
PC2:windows xp,cygwin,IP 10.0.0.3
The PC1 and PC2 have crawled different data.
For searching multiple indexes,my configuration is as follows:
1. configure /conf/slaves file on the both computers,the file contains the
flollowing:
10.0.0.2
10.0.0.3
2.I created a file called search-servers.txt
PC1: c:\nutch\servers\search-servers.txt
PC2: c:\nutch\servers\search-servers.txt
This file contains the following(host,port):
10.0.0.2 9988
10.0.0.3 9988
3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set the
searcher.dir property,so the property is:
<name>searcher.dir</name>
<value>c:\nutch\servers</value>
4.start the search servers by typing:
PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
PC2: ./bin/nutch server 9988 /cygdrive/e/crawl
start tomcat and go to:http://10.0.0.2/nutch/,my search results is 0
hits.tomcat's log is:
DistributedSearch- Querying segments from search servers...
DistributedSearch- STATS:2 servers,0 segments.
Why is 0 segments?
Thanks.
Re: Searching multiple indexes with Nutch-2 servers,0 segments
Posted by jqq <re...@gmail.com>.
Thank you very much! According to your method, the problem has been
resolved.
2009/5/4 Andrzej Bialecki <ab...@getopt.org>
> jqq wrote:
>
>> Thanks. But I did not make any configuration in hadoop-site.xml. In
>> addition, i set fs.default.name <http://fs.default.name> property:
>> <name>fs.default.name <http://fs.default.name></name>
>> <value>local</value>
>>
>> started the search servers and tomat, I still got any search results,
>> tomcat's log is:
>> DistributedSearch- Querying segments from search servers...
>> DistributedSearch- STATS:*2 servers,0 segments*.
>>
>
> This indicates that the problem is on the server side, i.e. servers can't
> find any usable segments. Ah, I think I know what's going on - only now I
> spotted the /cygdrive path in your example. Please use the Windows names for
> these paths, i.e. use "e:\crawl" (in quotes, or use double backslashes to
> escape single backslashes).
>
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
Re: Searching multiple indexes with Nutch-2 servers,0 segments
Posted by Andrzej Bialecki <ab...@getopt.org>.
jqq wrote:
> Thanks. But I did not make any configuration in hadoop-site.xml. In
> addition, i set fs.default.name <http://fs.default.name> property:
> <name>fs.default.name <http://fs.default.name></name>
> <value>local</value>
>
> started the search servers and tomat, I still got any search results,
> tomcat's log is:
> DistributedSearch- Querying segments from search servers...
> DistributedSearch- STATS:*2 servers,0 segments*.
This indicates that the problem is on the server side, i.e. servers
can't find any usable segments. Ah, I think I know what's going on -
only now I spotted the /cygdrive path in your example. Please use the
Windows names for these paths, i.e. use "e:\crawl" (in quotes, or use
double backslashes to escape single backslashes).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Searching multiple indexes with Nutch-2 servers,0 segments
Posted by jqq <re...@gmail.com>.
Thanks. But I did not make any configuration in hadoop-site.xml. In
addition, i set fs.default.name property:
<name>fs.default.name</name>
<value>local</value>
started the search servers and tomat, I still got any search results,
tomcat's log is:
DistributedSearch- Querying segments from search servers...
DistributedSearch- STATS:*2 servers,0 segments*.
2009/5/4 Andrzej Bialecki <ab...@getopt.org>
> jqq wrote:
>
>> Hi,
>> I have two computers, which are PC1 and PC2.
>> PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
>> PC2:windows xp,cygwin,IP 10.0.0.3
>> The PC1 and PC2 have crawled different data.
>> For searching multiple indexes,my configuration is as follows:
>> 1. configure /conf/slaves file on the both computers,the file contains the
>> flollowing:
>> 10.0.0.2
>> 10.0.0.3
>>
>
> conf/slaves doesn't configure the searching - it's only needed when
> starting / stopping a map-reduce cluster.
>
> 2.I created a file called search-servers.txt
>> PC1: c:\nutch\servers\search-servers.txt
>> PC2: c:\nutch\servers\search-servers.txt
>> This file contains the following(host,port):
>> 10.0.0.2 9988
>> 10.0.0.3 9988
>> 3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set the
>> searcher.dir property,so the property is:
>> <name>searcher.dir</name>
>> <value>c:\nutch\servers</value>
>>
>
> In this directory you should put a file search-servers.txt that contains:
>
> 10.0.0.2 9988
> 10.0.0.3 9988
>
> 4.start the search servers by typing:
>> PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
>> PC2: ./bin/nutch server 9988 /cygdrive/e/crawl
>>
>> start tomcat and go to:http://10.0.0.2/nutch/ , my search results is 0
>> hits.tomcat's log is:
>> DistributedSearch- Querying segments from search servers...
>> DistributedSearch- STATS:2 servers,0 segments.
>>
>> Why are 0 segments?
>>
>
> Common mistake is also to use hadoop-site.xml that configures Hadoop FS
> layer to use the distributed filesystem (DFS), while the data is located on
> the local filesystem.
>
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
Re: Searching multiple indexes with Nutch-2 servers,0 segments
Posted by Andrzej Bialecki <ab...@getopt.org>.
jqq wrote:
> Hi,
> I have two computers, which are PC1 and PC2.
> PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
> PC2:windows xp,cygwin,IP 10.0.0.3
> The PC1 and PC2 have crawled different data.
> For searching multiple indexes,my configuration is as follows:
> 1. configure /conf/slaves file on the both computers,the file contains
> the flollowing:
> 10.0.0.2
> 10.0.0.3
conf/slaves doesn't configure the searching - it's only needed when
starting / stopping a map-reduce cluster.
> 2.I created a file called search-servers.txt
> PC1: c:\nutch\servers\search-servers.txt
> PC2: c:\nutch\servers\search-servers.txt
> This file contains the following(host,port):
> 10.0.0.2 9988
> 10.0.0.3 9988
> 3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set
> the searcher.dir property,so the property is:
> <name>searcher.dir</name>
> <value>c:\nutch\servers</value>
In this directory you should put a file search-servers.txt that contains:
10.0.0.2 9988
10.0.0.3 9988
> 4.start the search servers by typing:
> PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
> PC2: ./bin/nutch server 9988 /cygdrive/e/crawl
>
> start tomcat and go to:http://10.0.0.2/nutch/ , my search results is 0
> hits.tomcat's log is:
> DistributedSearch- Querying segments from search servers...
> DistributedSearch- STATS:2 servers,0 segments.
>
> Why are 0 segments?
Common mistake is also to use hadoop-site.xml that configures Hadoop FS
layer to use the distributed filesystem (DFS), while the data is located
on the local filesystem.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Searching multiple indexes with Nutch-2 servers,0 segments
Posted by jqq <re...@gmail.com>.
Hi,
I have two computers, which are PC1 and PC2.
PC1:windows xp,cygwin,tomcat,IP 10.0.0.2
PC2:windows xp,cygwin,IP 10.0.0.3
The PC1 and PC2 have crawled different data.
For searching multiple indexes,my configuration is as follows:
1. configure /conf/slaves file on the both computers,the file contains the
flollowing:
10.0.0.2
10.0.0.3
2.I created a file called search-servers.txt
PC1: c:\nutch\servers\search-servers.txt
PC2: c:\nutch\servers\search-servers.txt
This file contains the following(host,port):
10.0.0.2 9988
10.0.0.3 9988
3.open c:\tomcat\webapps\nutch\WEB-INF\classes\nutch-site.xml and set the
searcher.dir property,so the property is:
<name>searcher.dir</name>
<value>c:\nutch\servers</value>
4.start the search servers by typing:
PC1: ./bin/nutch server 9988 /cygdrive/e/crawl
PC2: ./bin/nutch server 9988 /cygdrive/e/crawl
start tomcat and go to:http://10.0.0.2/nutch/ , my search results is 0
hits.tomcat's log is:
DistributedSearch- Querying segments from search servers...
DistributedSearch- STATS:2 servers,0 segments.
Why are 0 segments?
Thanks.