You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2006/07/27 23:45:52 UTC

Distributed Search

I know there has been a lot of discussion on distributed search...I am 
looking for a cross platform solution, which seems to kill solr's 
approach...Everyone seems to have implemented this, but only as 
proprietary code...it would seem that just using the RMI searcher would 
allow a simple solution? Is this the case? Could you easily provide 
clustering and fail over using a variety of indexes and searching them 
all with RMI searcher? Is it all really that complicated? I have read 
that Lucene tops out at about 10m docs for a single server...I want to 
hit 100m. I have a beautiful app that allows realtime updating/searching 
(updates are rare but should be instant)...and I just want it to scale 
up to 100m docs or so . Is that going to be an really advanced project 
no matter how I slice it? I have done a lot of custom work with the 
lucene stuff so it would seem difficult to adapt it to Nutch (but what 
do I know Nutch) ... I have seen a lot of talk but not much on a simple 
RMI searcher solution...any idea?


- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Distributed Search

Posted by Jeff Rodenburg <je...@gmail.com>.
Hi Mark -

Having gone down this path for the past year, I echo comments from others
that scalability/availability/failover is a lot of work.  We migrated away
from a custom system based on Lucene running on Windows to Solr running on
Linux.  It took us 6 months to get our system to a solid five-nines in
availability.  Having done this previously, I can advise one not to
underestimate the effort involved with this.  We would have taken the simple
route had it been available.

We shifted to Solr because of the operational elements that allows us to
achieve clustering and failover capability within the Linux/Apache/Tomcat
(our flavor) mix.  It just works better for us than our home-brew.

-- j

On 7/27/06, Mark Miller <ma...@gmail.com> wrote:
>
> I know there has been a lot of discussion on distributed search...I am
> looking for a cross platform solution, which seems to kill solr's
> approach...Everyone seems to have implemented this, but only as
> proprietary code...it would seem that just using the RMI searcher would
> allow a simple solution? Is this the case? Could you easily provide
> clustering and fail over using a variety of indexes and searching them
> all with RMI searcher? Is it all really that complicated? I have read
> that Lucene tops out at about 10m docs for a single server...I want to
> hit 100m. I have a beautiful app that allows realtime updating/searching
> (updates are rare but should be instant)...and I just want it to scale
> up to 100m docs or so . Is that going to be an really advanced project
> no matter how I slice it? I have done a lot of custom work with the
> lucene stuff so it would seem difficult to adapt it to Nutch (but what
> do I know Nutch) ... I have seen a lot of talk but not much on a simple
> RMI searcher solution...any idea?
>
>
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Distributed Search

Posted by Yonik Seeley <yo...@apache.org>.
On 7/27/06, Mark Miller <ma...@gmail.com> wrote:
> I thought I read that solr requires an OS that
> supports hard links and thought that Windows only supports soft links.

For the default index distribution method from master to searcher,
yes, hard-links are currently needed.

The distribution mechanism is *very* loosely coupled with Solr though,
and one could come up with an alternate method.  Also, cygwin might
support hard links to files now (I tried it quickly and it seems to
work) so that might be a path forward on Windows.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Distributed Search

Posted by Mark Miller <ma...@gmail.com>.
Otis Gospodnetic wrote:
> I think we have an RMI example in Lucene in Action.
> You could also look at how Nutch does it.  I think the code is in org.apache.nutch.ipc package.
> I'm not sure why cross-platform requirement rules out Solr, I would think it would exactly the opposite.
> As for 10m limit, it depends.  It depends on the actual size of the index (indexed fields), complexity of queries, required query latency, the hardware you throw at it, etc.  So you can't really say 10m is the limit.  You might have gotten that number from some of the older Nutch docs/presentations, which means they are a few years old now and are Nutch-specific.
>
> Clustering and failover and "easily" don't really go together, in my experience, and this is not limited to Luceneland. :(
> I'd love to be wrong about this, but it seems clustering/failover/HA stuff + Lucene always ends up being a custom and propriatory job.
>
> Otis
>
> ----- Original Message ----
> From: Mark Miller <ma...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, July 27, 2006 5:45:52 PM
> Subject: Distributed Search
>
> I know there has been a lot of discussion on distributed search...I am 
> looking for a cross platform solution, which seems to kill solr's 
> approach...Everyone seems to have implemented this, but only as 
> proprietary code...it would seem that just using the RMI searcher would 
> allow a simple solution? Is this the case? Could you easily provide 
> clustering and fail over using a variety of indexes and searching them 
> all with RMI searcher? Is it all really that complicated? I have read 
> that Lucene tops out at about 10m docs for a single server...I want to 
> hit 100m. I have a beautiful app that allows realtime updating/searching 
> (updates are rare but should be instant)...and I just want it to scale 
> up to 100m docs or so . Is that going to be an really advanced project 
> no matter how I slice it? I have done a lot of custom work with the 
> lucene stuff so it would seem difficult to adapt it to Nutch (but what 
> do I know Nutch) ... I have seen a lot of talk but not much on a simple 
> RMI searcher solution...any idea?
>
>
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   
Thanks for the info Otis. I thought I read that solr requires an OS that 
supports hard links and thought that Windows only supports soft links. 
Perhaps I am wrong.

Thanks,

- mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Distributed Search

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I think we have an RMI example in Lucene in Action.
You could also look at how Nutch does it.  I think the code is in org.apache.nutch.ipc package.
I'm not sure why cross-platform requirement rules out Solr, I would think it would exactly the opposite.
As for 10m limit, it depends.  It depends on the actual size of the index (indexed fields), complexity of queries, required query latency, the hardware you throw at it, etc.  So you can't really say 10m is the limit.  You might have gotten that number from some of the older Nutch docs/presentations, which means they are a few years old now and are Nutch-specific.

Clustering and failover and "easily" don't really go together, in my experience, and this is not limited to Luceneland. :(
I'd love to be wrong about this, but it seems clustering/failover/HA stuff + Lucene always ends up being a custom and propriatory job.

Otis

----- Original Message ----
From: Mark Miller <ma...@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, July 27, 2006 5:45:52 PM
Subject: Distributed Search

I know there has been a lot of discussion on distributed search...I am 
looking for a cross platform solution, which seems to kill solr's 
approach...Everyone seems to have implemented this, but only as 
proprietary code...it would seem that just using the RMI searcher would 
allow a simple solution? Is this the case? Could you easily provide 
clustering and fail over using a variety of indexes and searching them 
all with RMI searcher? Is it all really that complicated? I have read 
that Lucene tops out at about 10m docs for a single server...I want to 
hit 100m. I have a beautiful app that allows realtime updating/searching 
(updates are rare but should be instant)...and I just want it to scale 
up to 100m docs or so . Is that going to be an really advanced project 
no matter how I slice it? I have done a lot of custom work with the 
lucene stuff so it would seem difficult to adapt it to Nutch (but what 
do I know Nutch) ... I have seen a lot of talk but not much on a simple 
RMI searcher solution...any idea?


- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org