You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Milan Krendzelak <mk...@mtld.mobi> on 2007/09/12 13:44:37 UTC

Distributed Search

Hi guys,
 
I am trying to set up Nutch to perform Distributed Search but still no luck. 
Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
All help is appreciable. At least could you point me to some documentation about this problematic?
Many thanks.
Cheers,
Milan
 
Milan Krendzelak
Senior Software Developer
 
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2

RE: Distributed Search

Posted by Milan Krendzelak <mk...@mtld.mobi>.
Cool, thanks a lot John for pointing me to the right directs.
Cheers,
M.
 
Milan Krendzelak
Senior Software Developer
 
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2

________________________________

From: searchfresco [mailto:searchfresco@bellsouth.net]
Sent: Wed 12/09/2007 16:13
To: nutch-user@lucene.apache.org
Subject: Re: Distributed Search



Right!

You wouldn't put identical merged indexes into  different search
servers, you want the indexes to be at least deduped  before sending to
the search servers. So you would crawl > index > dedup and then send the
individual indexes to the servers.

Look at the mergesegs (0.7.2 version) with -max flag, that would break
one large index into several smaller indexes for distribution.

John

Milan Krendzelak wrote:
> John, thanks a lot for clues! It helped me a lot. I am still not considered about how Nutch merge the results. Is it possible to also not to merger the results?
> In my case, I have few indexes with different content and displaying the search results at once, so I don't need to merge.
> Maybe by creating few instances of DistributedSearch$Client will help me.
> 
> Cheers,
> Milan
> 
> Milan Krendzelak
> Senior Software Developer
> 
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
> ________________________________
>
> From: searchfresco [mailto:searchfresco@bellsouth.net]
> Sent: Wed 12/09/2007 13:51
> To: nutch-user@lucene.apache.org
> Subject: Re: Distributed Search
>
>
>
> Its fairly straight forward
>
> Setup 3 nutch installations, two will hold live indexes and one will
> hold the file "search-servers.txt" in lieu of indexes/segments, the file
> "search-servers.txt" tells the searcher where to find the indexes and
> which port the search servers are listening on.
>
> Like this:
>
> an.ip.address 8100
> another.ip.address 8100
>
> Start the search servers holding the live indexes with the nutch server
> command:
>
> bin/nutch server 8100 .
>
> Now start the search app/tomcat  holding the "search-servers.txt" file
> as usual.
>
> You can do this locally on one machine by running the search servers on
> different ports:
>
> localhost 8100
> localhost 8200
>
> You might want to do that to "see" how it works.
>
> John
>
>
>
>
>
> To
>
>  Milan Krendzelak wrote:
>  
>> Hi guys,
>>
>> I am trying to set up Nutch to perform Distributed Search but still no luck.
>> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
>> All help is appreciable. At least could you point me to some documentation about this problematic?
>> Many thanks.
>> Cheers,
>> Milan
>>
>> Milan Krendzelak
>> Senior Software Developer
>>
>> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>>
>> 
>>    
>
>
>
>
>  




Re: Distributed Search

Posted by searchfresco <se...@bellsouth.net>.
Right!

You wouldn't put identical merged indexes into  different search
servers, you want the indexes to be at least deduped  before sending to
the search servers. So you would crawl > index > dedup and then send the
individual indexes to the servers.

Look at the mergesegs (0.7.2 version) with -max flag, that would break
one large index into several smaller indexes for distribution.

John

Milan Krendzelak wrote:
> John, thanks a lot for clues! It helped me a lot. I am still not considered about how Nutch merge the results. Is it possible to also not to merger the results?
> In my case, I have few indexes with different content and displaying the search results at once, so I don't need to merge. 
> Maybe by creating few instances of DistributedSearch$Client will help me.
>  
> Cheers,
> Milan
>  
> Milan Krendzelak
> Senior Software Developer
>  
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
> ________________________________
>
> From: searchfresco [mailto:searchfresco@bellsouth.net]
> Sent: Wed 12/09/2007 13:51
> To: nutch-user@lucene.apache.org
> Subject: Re: Distributed Search
>
>
>
> Its fairly straight forward
>
> Setup 3 nutch installations, two will hold live indexes and one will
> hold the file "search-servers.txt" in lieu of indexes/segments, the file
> "search-servers.txt" tells the searcher where to find the indexes and
> which port the search servers are listening on.
>
> Like this:
>
> an.ip.address 8100
> another.ip.address 8100
>
> Start the search servers holding the live indexes with the nutch server
> command:
>
> bin/nutch server 8100 .
>
> Now start the search app/tomcat  holding the "search-servers.txt" file
> as usual.
>
> You can do this locally on one machine by running the search servers on
> different ports:
>
> localhost 8100
> localhost 8200
>
> You might want to do that to "see" how it works.
>
> John
>
>
>
>
>
> To
>
>  Milan Krendzelak wrote:
>   
>> Hi guys,
>>
>> I am trying to set up Nutch to perform Distributed Search but still no luck.
>> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
>> All help is appreciable. At least could you point me to some documentation about this problematic?
>> Many thanks.
>> Cheers,
>> Milan
>>
>> Milan Krendzelak
>> Senior Software Developer
>>
>> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>>
>>  
>>     
>
>
>
>
>   


RE: Distributed Search

Posted by Milan Krendzelak <mk...@mtld.mobi>.
John, thanks a lot for clues! It helped me a lot. I am still not considered about how Nutch merge the results. Is it possible to also not to merger the results?
In my case, I have few indexes with different content and displaying the search results at once, so I don't need to merge. 
Maybe by creating few instances of DistributedSearch$Client will help me.
 
Cheers,
Milan
 
Milan Krendzelak
Senior Software Developer
 
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2

________________________________

From: searchfresco [mailto:searchfresco@bellsouth.net]
Sent: Wed 12/09/2007 13:51
To: nutch-user@lucene.apache.org
Subject: Re: Distributed Search



Its fairly straight forward

Setup 3 nutch installations, two will hold live indexes and one will
hold the file "search-servers.txt" in lieu of indexes/segments, the file
"search-servers.txt" tells the searcher where to find the indexes and
which port the search servers are listening on.

Like this:

an.ip.address 8100
another.ip.address 8100

Start the search servers holding the live indexes with the nutch server
command:

bin/nutch server 8100 .

Now start the search app/tomcat  holding the "search-servers.txt" file
as usual.

You can do this locally on one machine by running the search servers on
different ports:

localhost 8100
localhost 8200

You might want to do that to "see" how it works.

John





To

 Milan Krendzelak wrote:
> Hi guys,
> 
> I am trying to set up Nutch to perform Distributed Search but still no luck.
> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
> All help is appreciable. At least could you point me to some documentation about this problematic?
> Many thanks.
> Cheers,
> Milan
> 
> Milan Krendzelak
> Senior Software Developer
> 
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
>  




Re: Distributed Search

Posted by searchfresco <se...@bellsouth.net>.
Its fairly straight forward

Setup 3 nutch installations, two will hold live indexes and one will
hold the file "search-servers.txt" in lieu of indexes/segments, the file
"search-servers.txt" tells the searcher where to find the indexes and
which port the search servers are listening on.

Like this:

an.ip.address 8100
another.ip.address 8100

Start the search servers holding the live indexes with the nutch server
command:

bin/nutch server 8100 .

Now start the search app/tomcat  holding the "search-servers.txt" file
as usual.

You can do this locally on one machine by running the search servers on
different ports:

localhost 8100
localhost 8200

You might want to do that to "see" how it works.

John





To

 Milan Krendzelak wrote:
> Hi guys,
>  
> I am trying to set up Nutch to perform Distributed Search but still no luck. 
> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
> All help is appreciable. At least could you point me to some documentation about this problematic?
> Many thanks.
> Cheers,
> Milan
>  
> Milan Krendzelak
> Senior Software Developer
>  
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
>