You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by souravm <SO...@infosys.com> on 2008/11/24 06:24:51 UTC
Query for Distributed search -
Hi,
Looking for some insight on distributed search.
Say I have an index distributed in 3 boxes and the index contains time and text data (typical log file). Each box has index for different timeline - say Box 1 for all Jan to April, Box 2 for May to August and Box 3 for Sep to Dec.
Now if I try to search for a text string, will the search would happen in parallel in all 3 boxes or sequentially?
Regards,
Sourav
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
every reasonable precaution to minimize this risk, but is not liable for any damage
you may sustain as a result of any virus in this e-mail. You should carry out your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
Re: Query for Distributed search -
Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Query for Distributed search -
: In-Reply-To: <c6...@mail.gmail.com>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives
particularly difficult.
See Also: http://en.wikipedia.org/wiki/Thread_hijacking
-Hoss
RE: Query for Distributed search -
Posted by souravm <SO...@infosys.com>.
Hi,
I understand your point on how do I do it myself in my Java code.
However, I'm more interested to know how the default behaviour of DistributedSearch work when I issue a command like "curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'" as mentioned in the wiki.
Regards,
Sourav
-----Original Message-----
From: Aleksander M. Stensby [mailto:aleksander.stensby@integrasco.no]
Sent: Monday, November 24, 2008 12:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Query for Distributed search -
If you for instance use SolrJ and the HttpSolrServer, you could for
instance add logic to your querying making your searches more efficient!
That is partially the idea of sharding, right? :) So if the user wants to
search for a log file in June, your application knows that June logs are
stored on the second box, and hence will redirect the search to that box.
Alternatively if he wants to search for logs spanning two boxes, you
merely add the shards parameter to your query and just include the path to
those to shards in question. I'm not really sure about how solr handles
the merging of results etc and wether or not the requests are done in
paralell or sequentially, but I do know that you could easily manage this
on your own through java if you want to. (Simply setting up one
HttpSolrServer in your code for each shard, and searching them in
parallell in separate threads. => then reducing the results afterwards).
Have a look at http://wiki.apache.org/solr/DistributedSearch for more info.
You could also take a look at Hadoop. (http://hadoop.apache.org/)
regards,
Aleks
On Mon, 24 Nov 2008 06:24:51 +0100, souravm <SO...@infosys.com> wrote:
> Hi,
>
> Looking for some insight on distributed search.
>
> Say I have an index distributed in 3 boxes and the index contains time
> and text data (typical log file). Each box has index for different
> timeline - say Box 1 for all Jan to April, Box 2 for May to August and
> Box 3 for Sep to Dec.
>
> Now if I try to search for a text string, will the search would happen
> in parallel in all 3 boxes or sequentially?
>
> Regards,
> Sourav
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further,
> you are not
> to copy, disclose, or distribute this e-mail or its contents to any
> other person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys
> has taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should
> carry out your
> own virus checks before opening the e-mail or attachment. Infosys
> reserves the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on
> the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no
Re: Query for Distributed search -
Posted by "Aleksander M. Stensby" <al...@integrasco.no>.
If you for instance use SolrJ and the HttpSolrServer, you could for
instance add logic to your querying making your searches more efficient!
That is partially the idea of sharding, right? :) So if the user wants to
search for a log file in June, your application knows that June logs are
stored on the second box, and hence will redirect the search to that box.
Alternatively if he wants to search for logs spanning two boxes, you
merely add the shards parameter to your query and just include the path to
those to shards in question. I'm not really sure about how solr handles
the merging of results etc and wether or not the requests are done in
paralell or sequentially, but I do know that you could easily manage this
on your own through java if you want to. (Simply setting up one
HttpSolrServer in your code for each shard, and searching them in
parallell in separate threads. => then reducing the results afterwards).
Have a look at http://wiki.apache.org/solr/DistributedSearch for more info.
You could also take a look at Hadoop. (http://hadoop.apache.org/)
regards,
Aleks
On Mon, 24 Nov 2008 06:24:51 +0100, souravm <SO...@infosys.com> wrote:
> Hi,
>
> Looking for some insight on distributed search.
>
> Say I have an index distributed in 3 boxes and the index contains time
> and text data (typical log file). Each box has index for different
> timeline - say Box 1 for all Jan to April, Box 2 for May to August and
> Box 3 for Sep to Dec.
>
> Now if I try to search for a text string, will the search would happen
> in parallel in all 3 boxes or sequentially?
>
> Regards,
> Sourav
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further,
> you are not
> to copy, disclose, or distribute this e-mail or its contents to any
> other person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys
> has taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should
> carry out your
> own virus checks before opening the e-mail or attachment. Infosys
> reserves the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on
> the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no
Re: Query for Distributed search -
Posted by James liu <li...@gmail.com>.
Up to your solr client.
On Mon, Nov 24, 2008 at 1:24 PM, souravm <SO...@infosys.com> wrote:
> Hi,
>
> Looking for some insight on distributed search.
>
> Say I have an index distributed in 3 boxes and the index contains time and
> text data (typical log file). Each box has index for different timeline -
> say Box 1 for all Jan to April, Box 2 for May to August and Box 3 for Sep to
> Dec.
>
> Now if I try to search for a text string, will the search would happen in
> parallel in all 3 boxes or sequentially?
>
> Regards,
> Sourav
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
--
regards
j.L