You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Giovanni De Stefano <gi...@gmail.com> on 2009/03/20 09:58:39 UTC

Search transparently with Solr with multiple cores, different indexes, common response type

Hello all,

here I am with another question... :-)

I figured that I have to change approach to implement the requirements I
have :-(

Here it is what I have to index:

1) data "A" in an Oracle DB Table "A"
2) data "B" in an Oracle DB Table "B"
3) data "C" in different files

Data "A", "B", and "C" are slightly different, thus they are indexed
differently; obviously the client receives the search results for all data
types in a consistent/common format. The client application shall be able to
search among each or all data types ("A", "B", "C"). The order will be
configurable, like: return the first 5 from data "A", the first 10 from "B",
all "C".

At first I thought of using only one Solr with different datasources, and
one huge index, but I figured that delta imports would be very
hard/expensive/impossible.

Reading some other posts I thought that maybe a better approach would be as
following:

1) one Solr core for each data type (one for "A", one for "B", one for "C")
2) one index fora each data type, thus one document type for "A", one for
"B", and one for "C"
3) client applications shall be able to search on one or all cores
4) the cores shall return search results in a common XML format
5) search results shall be aggregated in a configurable way

Can you please tell me if this architecture is possible with Solr? Obviously
I am not looking for an "out-of-the-.box" solution, I just need to
understand what I have to develop myself and what is already available.

1) is a multicore architecture: I know it is possible and I tested that it
works great
2) same as above, no problems here :-)
3) I want to "hide" the different cores to the client application; the
client application should send the requests to one "guy" that parses the
request and forwards it to the cores. Is this a custom RequestHandler? Any
link (to the Wiki?) to understand better? Or is there anything already
available to achieve this?
4) The "guy" that parses the request and forwards it to the cores shall
aggregate and return results in a common XML format: is this a custom
ResponseHandler?
5) I know this is just my business logic :-)

Any thougts/warning/advice about this?

Thanks a lot in advance!
Giovanni

Re: Search transparently with Solr with multiple cores, different indexes, common response type

Posted by Giovanni De Stefano <gi...@gmail.com>.
Hello Hoss, Steve,

thank you very much for your feedbacks, they have been very helpful making
me feel more confident now about this architecture.

In fact I decided to go for a single shared schema, but keeping multiple
indexes (multicore) because those two indexes are very different: one is
huge and updated not very often (once a day delta, once a week full) and the
other one is not that big and it is updated frequently (once per hour, once
per day, once per week full).

My boss is happy...thus I am happy too :-)

Now I am struggling a bit with Solrj...but that is already in another post
of mine :-)

Cheers,
Giovanni


On 3/26/09, Stephen Weiss <sw...@stylesight.com> wrote:
>
>
>> I have a very similar setup and that's precisely what we do - except with
> JSON.
>
> 1) Request comes into PHP
> 2) PHP runs the search against several different cores (in a multicore
> setup) - ours are a little more than "slightly" different
> 3) PHP constructs a new object with the responseHeader and response objects
> joined together (basically add the record counts together in the header and
> then concatenate the arrays of documents)
> 4) PHP encodes the combined data into JSON and returns it
>
> It sounds clunky but it all manages to happen very quickly (< 200 ms round
> trip).  The only problem you might hit is with paging, but from the way you
> describe your situation it doesn't sound like that will be a problem.  It's
> more of an issue if you're trying to make them seamlessly flow into each
> other, but it sounds like you plan on presenting them separately (as we do).
>
> --
> Steve
>
>
>> it could be a custom request handler, but it doesn't have to be -- you
>> could implment it in whatever way is easiest for you (there's no reason
>> why it has to run in the same JVM or on the same physical machine as Solr
>> ... it could be a PHP script on another server if you want)
>>
>>
>>
>>
>> -Hoss
>>
>>
>

Re: Search transparently with Solr with multiple cores, different indexes, common response type

Posted by Stephen Weiss <sw...@stylesight.com>.
>
I have a very similar setup and that's precisely what we do - except  
with JSON.

1) Request comes into PHP
2) PHP runs the search against several different cores (in a multicore  
setup) - ours are a little more than "slightly" different
3) PHP constructs a new object with the responseHeader and response  
objects joined together (basically add the record counts together in  
the header and then concatenate the arrays of documents)
4) PHP encodes the combined data into JSON and returns it

It sounds clunky but it all manages to happen very quickly (< 200 ms  
round trip).  The only problem you might hit is with paging, but from  
the way you describe your situation it doesn't sound like that will be  
a problem.  It's more of an issue if you're trying to make them  
seamlessly flow into each other, but it sounds like you plan on  
presenting them separately (as we do).

--
Steve

>
> it could be a custom request handler, but it doesn't have to be -- you
> could implment it in whatever way is easiest for you (there's no  
> reason
> why it has to run in the same JVM or on the same physical machine as  
> Solr
> ... it could be a PHP script on another server if you want)
>
>
>
>
> -Hoss
>


Re: Search transparently with Solr with multiple cores, different indexes, common response type

Posted by Chris Hostetter <ho...@fucit.org>.
: Data "A", "B", and "C" are slightly different, thus they are indexed
: differently; obviously the client receives the search results for all data
: types in a consistent/common format. The client application shall be able to
: search among each or all data types ("A", "B", "C"). The order will be
: configurable, like: return the first 5 from data "A", the first 10 from "B",
: all "C".

generally speaking, when people say they have data from X different 
sources, but they want them all returned in a unified list of results, i 
advocate for having a single index -- but if you have a specific need to 
be able to pick and choose how many results you get from each "set" then 
using seperate cores (either on seperate machines, or in a multicore 
setup) as you outlined below does seem like a more logical choice....

: 1) one Solr core for each data type (one for "A", one for "B", one for "C")
: 2) one index fora each data type, thus one document type for "A", one for
: "B", and one for "C"
: 3) client applications shall be able to search on one or all cores
: 4) the cores shall return search results in a common XML format
: 5) search results shall be aggregated in a configurable way
: 
: Can you please tell me if this architecture is possible with Solr? Obviously
: I am not looking for an "out-of-the-.box" solution, I just need to
: understand what I have to develop myself and what is already available.

it's certainly possible, but these aggregation pieces (with your custom 
biz rules) would need to be implemented yourself.

: client application should send the requests to one "guy" that parses the
: request and forwards it to the cores. Is this a custom RequestHandler? Any
: link (to the Wiki?) to understand better? Or is there anything already
: available to achieve this?
: 4) The "guy" that parses the request and forwards it to the cores shall
: aggregate and return results in a common XML format: is this a custom
: ResponseHandler?

it could be a custom request handler, but it doesn't have to be -- you 
could implment it in whatever way is easiest for you (there's no reason 
why it has to run in the same JVM or on the same physical machine as Solr 
... it could be a PHP script on another server if you want)




-Hoss