You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Milan Krendzelak <mk...@mtld.mobi> on 2007/09/12 13:44:37 UTC
Distributed Search
Hi guys,
I am trying to set up Nutch to perform Distributed Search but still no luck.
Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
All help is appreciable. At least could you point me to some documentation about this problematic?
Many thanks.
Cheers,
Milan
Milan Krendzelak
Senior Software Developer
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
RE: Distributed Search
Posted by Milan Krendzelak <mk...@mtld.mobi>.
Cool, thanks a lot John for pointing me to the right directs.
Cheers,
M.
Milan Krendzelak
Senior Software Developer
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
________________________________
From: searchfresco [mailto:searchfresco@bellsouth.net]
Sent: Wed 12/09/2007 16:13
To: nutch-user@lucene.apache.org
Subject: Re: Distributed Search
Right!
You wouldn't put identical merged indexes into different search
servers, you want the indexes to be at least deduped before sending to
the search servers. So you would crawl > index > dedup and then send the
individual indexes to the servers.
Look at the mergesegs (0.7.2 version) with -max flag, that would break
one large index into several smaller indexes for distribution.
John
Milan Krendzelak wrote:
> John, thanks a lot for clues! It helped me a lot. I am still not considered about how Nutch merge the results. Is it possible to also not to merger the results?
> In my case, I have few indexes with different content and displaying the search results at once, so I don't need to merge.
> Maybe by creating few instances of DistributedSearch$Client will help me.
>
> Cheers,
> Milan
>
> Milan Krendzelak
> Senior Software Developer
>
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
> ________________________________
>
> From: searchfresco [mailto:searchfresco@bellsouth.net]
> Sent: Wed 12/09/2007 13:51
> To: nutch-user@lucene.apache.org
> Subject: Re: Distributed Search
>
>
>
> Its fairly straight forward
>
> Setup 3 nutch installations, two will hold live indexes and one will
> hold the file "search-servers.txt" in lieu of indexes/segments, the file
> "search-servers.txt" tells the searcher where to find the indexes and
> which port the search servers are listening on.
>
> Like this:
>
> an.ip.address 8100
> another.ip.address 8100
>
> Start the search servers holding the live indexes with the nutch server
> command:
>
> bin/nutch server 8100 .
>
> Now start the search app/tomcat holding the "search-servers.txt" file
> as usual.
>
> You can do this locally on one machine by running the search servers on
> different ports:
>
> localhost 8100
> localhost 8200
>
> You might want to do that to "see" how it works.
>
> John
>
>
>
>
>
> To
>
> Milan Krendzelak wrote:
>
>> Hi guys,
>>
>> I am trying to set up Nutch to perform Distributed Search but still no luck.
>> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
>> All help is appreciable. At least could you point me to some documentation about this problematic?
>> Many thanks.
>> Cheers,
>> Milan
>>
>> Milan Krendzelak
>> Senior Software Developer
>>
>> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>>
>>
>>
>
>
>
>
>
Re: Distributed Search
Posted by searchfresco <se...@bellsouth.net>.
Right!
You wouldn't put identical merged indexes into different search
servers, you want the indexes to be at least deduped before sending to
the search servers. So you would crawl > index > dedup and then send the
individual indexes to the servers.
Look at the mergesegs (0.7.2 version) with -max flag, that would break
one large index into several smaller indexes for distribution.
John
Milan Krendzelak wrote:
> John, thanks a lot for clues! It helped me a lot. I am still not considered about how Nutch merge the results. Is it possible to also not to merger the results?
> In my case, I have few indexes with different content and displaying the search results at once, so I don't need to merge.
> Maybe by creating few instances of DistributedSearch$Client will help me.
>
> Cheers,
> Milan
>
> Milan Krendzelak
> Senior Software Developer
>
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
> ________________________________
>
> From: searchfresco [mailto:searchfresco@bellsouth.net]
> Sent: Wed 12/09/2007 13:51
> To: nutch-user@lucene.apache.org
> Subject: Re: Distributed Search
>
>
>
> Its fairly straight forward
>
> Setup 3 nutch installations, two will hold live indexes and one will
> hold the file "search-servers.txt" in lieu of indexes/segments, the file
> "search-servers.txt" tells the searcher where to find the indexes and
> which port the search servers are listening on.
>
> Like this:
>
> an.ip.address 8100
> another.ip.address 8100
>
> Start the search servers holding the live indexes with the nutch server
> command:
>
> bin/nutch server 8100 .
>
> Now start the search app/tomcat holding the "search-servers.txt" file
> as usual.
>
> You can do this locally on one machine by running the search servers on
> different ports:
>
> localhost 8100
> localhost 8200
>
> You might want to do that to "see" how it works.
>
> John
>
>
>
>
>
> To
>
> Milan Krendzelak wrote:
>
>> Hi guys,
>>
>> I am trying to set up Nutch to perform Distributed Search but still no luck.
>> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
>> All help is appreciable. At least could you point me to some documentation about this problematic?
>> Many thanks.
>> Cheers,
>> Milan
>>
>> Milan Krendzelak
>> Senior Software Developer
>>
>> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>>
>>
>>
>
>
>
>
>
RE: Distributed Search
Posted by Milan Krendzelak <mk...@mtld.mobi>.
John, thanks a lot for clues! It helped me a lot. I am still not considered about how Nutch merge the results. Is it possible to also not to merger the results?
In my case, I have few indexes with different content and displaying the search results at once, so I don't need to merge.
Maybe by creating few instances of DistributedSearch$Client will help me.
Cheers,
Milan
Milan Krendzelak
Senior Software Developer
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
________________________________
From: searchfresco [mailto:searchfresco@bellsouth.net]
Sent: Wed 12/09/2007 13:51
To: nutch-user@lucene.apache.org
Subject: Re: Distributed Search
Its fairly straight forward
Setup 3 nutch installations, two will hold live indexes and one will
hold the file "search-servers.txt" in lieu of indexes/segments, the file
"search-servers.txt" tells the searcher where to find the indexes and
which port the search servers are listening on.
Like this:
an.ip.address 8100
another.ip.address 8100
Start the search servers holding the live indexes with the nutch server
command:
bin/nutch server 8100 .
Now start the search app/tomcat holding the "search-servers.txt" file
as usual.
You can do this locally on one machine by running the search servers on
different ports:
localhost 8100
localhost 8200
You might want to do that to "see" how it works.
John
To
Milan Krendzelak wrote:
> Hi guys,
>
> I am trying to set up Nutch to perform Distributed Search but still no luck.
> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
> All help is appreciable. At least could you point me to some documentation about this problematic?
> Many thanks.
> Cheers,
> Milan
>
> Milan Krendzelak
> Senior Software Developer
>
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
>
Re: Distributed Search
Posted by searchfresco <se...@bellsouth.net>.
Its fairly straight forward
Setup 3 nutch installations, two will hold live indexes and one will
hold the file "search-servers.txt" in lieu of indexes/segments, the file
"search-servers.txt" tells the searcher where to find the indexes and
which port the search servers are listening on.
Like this:
an.ip.address 8100
another.ip.address 8100
Start the search servers holding the live indexes with the nutch server
command:
bin/nutch server 8100 .
Now start the search app/tomcat holding the "search-servers.txt" file
as usual.
You can do this locally on one machine by running the search servers on
different ports:
localhost 8100
localhost 8200
You might want to do that to "see" how it works.
John
To
Milan Krendzelak wrote:
> Hi guys,
>
> I am trying to set up Nutch to perform Distributed Search but still no luck.
> Actually, I am looking for some new documentation for Nutch 0.8 how to proceed.
> All help is appreciable. At least could you point me to some documentation about this problematic?
> Many thanks.
> Cheers,
> Milan
>
> Milan Krendzelak
> Senior Software Developer
>
> mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
>