You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Susheel Kumar <su...@thedigitalgroup.net> on 2013/11/05 05:09:58 UTC

Does solr supports Federated search, if not what framework

Hello,

We have a scenario where we present results to users one from solr and other from real time web site search. The solr data we have locally available that we are able to index but other website search, we don't host data and it is real time.

We are wondering if we can use some federated search framework which can unify the results into single set with relevancy and all.

Any thoughts?

Thanks & appreciate your help.
Susheel

-----Original Message-----
From: Patanachai Tangchaisin [mailto:patanachai.tangchaisin@wizecommerce.com] 
Sent: Monday, November 04, 2013 7:38 PM
To: solr-user@lucene.apache.org
Subject: Disjuctive Queries (OR queries) and FilterCache

Hello,

We are running our search system using Apache Solr 4.2.1 and using Master/Slave model.
Our index has ~100M document. The index size is  ~20gb.
The machine has 24 CPU and 48gb rams.

Our response time is pretty bad, median is ~4 seconds with 25 queries/second.

We noticed a couple of things
- Our machine always use 100% CPU.
- There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but the size of heap is still only 12g
- Solr's filterCache hit ratio is only 0.76 and the number of insertion and eviction is almost equal.

The weird thing is
- most items in Solr's filterCache (only 100 first) are specify to only
1 field which we filter it by using an OR query for this field. Note that every request will have this field constraint.

For example, if field name is x
fq=x:(1 OR 2 OR 3)&fq=y:'a'
fq=x:(3 OR 2 OR 1)&fq=y:'b'
fq=x:(2 OR 1 OR 3)&fq=y:'c'

An order of items is different since it is an input from a different system.

To me, it seems that Solr do a cache on this field in different entry if an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going to be a different cache entry.

Question:
Is there other way to create a fq parameter using 'OR' and make Solr cache them as a same entry?


Thanks,
Patanachai Tangchaisin

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Does solr supports Federated search, if not what framework

Posted by Erick Erickson <er...@gmail.com>.
First, please start a new thread when changing
topics, see "thread hijacking" here
http://people.apache.org/~hossman/#threadhijack

But do be aware that scores are NOT comparable
between different queries on the _same_ corpus.
A score of .75 on one query has no relation to a
score of .75 on another. So "federated search"
is hard, you usually have to figure out a way to
group the results in a way that's meaningful to
a user.

Don't quite know how carrot handles that one...

FWIW,
Erick


On Mon, Nov 4, 2013 at 11:09 PM, Susheel Kumar <
susheel.kumar@thedigitalgroup.net> wrote:

> Hello,
>
> We have a scenario where we present results to users one from solr and
> other from real time web site search. The solr data we have locally
> available that we are able to index but other website search, we don't host
> data and it is real time.
>
> We are wondering if we can use some federated search framework which can
> unify the results into single set with relevancy and all.
>
> Any thoughts?
>
> Thanks & appreciate your help.
> Susheel
>
> -----Original Message-----
> From: Patanachai Tangchaisin [mailto:
> patanachai.tangchaisin@wizecommerce.com]
> Sent: Monday, November 04, 2013 7:38 PM
> To: solr-user@lucene.apache.org
> Subject: Disjuctive Queries (OR queries) and FilterCache
>
> Hello,
>
> We are running our search system using Apache Solr 4.2.1 and using
> Master/Slave model.
> Our index has ~100M document. The index size is  ~20gb.
> The machine has 24 CPU and 48gb rams.
>
> Our response time is pretty bad, median is ~4 seconds with 25
> queries/second.
>
> We noticed a couple of things
> - Our machine always use 100% CPU.
> - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
> the size of heap is still only 12g
> - Solr's filterCache hit ratio is only 0.76 and the number of insertion
> and eviction is almost equal.
>
> The weird thing is
> - most items in Solr's filterCache (only 100 first) are specify to only
> 1 field which we filter it by using an OR query for this field. Note that
> every request will have this field constraint.
>
> For example, if field name is x
> fq=x:(1 OR 2 OR 3)&fq=y:'a'
> fq=x:(3 OR 2 OR 1)&fq=y:'b'
> fq=x:(2 OR 1 OR 3)&fq=y:'c'
>
> An order of items is different since it is an input from a different
> system.
>
> To me, it seems that Solr do a cache on this field in different entry if
> an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going to
> be a different cache entry.
>
> Question:
> Is there other way to create a fq parameter using 'OR' and make Solr cache
> them as a same entry?
>
>
> Thanks,
> Patanachai Tangchaisin
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: Does solr supports Federated search, if not what framework

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On Tue, Nov 5, 2013 at 6:09 AM, Susheel Kumar <
susheel.kumar@thedigitalgroup.net> wrote:

> Hello,
>
> We have a scenario where we present results to users one from solr and
> other from real time web site search. The solr data we have locally
> available that we are able to index but other website search, we don't host
> data and it is real time.
>

Have you looked at Carrot2? http://project.carrot2.org/

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)