You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jacek Plebanek <pl...@gmail.com> on 2012/05/10 18:31:11 UTC

Federated search in Solr - proposal

Hello,

I'm starting to work on federated search algorithms for my PhD study.
I'll use Solr to implement them (Since I have two years experience with
Solr at my work).

I thought that at least part of my work could be useful for Solr Project
and I could contribute some code. I mean specifically the
components/modifications to add federated search support to Solr.

By "Federated Search" I mean searching across heterogeneous data sources
(something different than existing Distributed Search implemented in
Solr) - to allow Solr to merge results not only from SolrServer
instances, but also to include results from external sources (eg. search
engines using different API). The use case would look like this:
- user sends the request to Solr (eg. SearchRequest)
- Solr handles the request internally and/or sends it to other Solr
instances (current Distributed Search) AND sends it to specified
external data sources using dedicated adapters.
- Solr merges the results from Solr instances with results from external
collections and returns the combined results to user.

To perform this scenario the four common federated search parts should
be supported:
- collection representation (external collections probably won't provide
the same informations as Solr, like tf-idf)
- collection selection (predict which collections may return relevant
results and transfer the search request only to them)
- result merging (merge results based on more limited informations than
Solr provides)
- external sources connection (common API to write custom collections
adapters)

I thought I would write some federated search components - schema to
allow developers to implement custom algorithms/plugins for each part of
federated search scenario.


What do You think about that?


Sorry for my English :)

Jacek Plebanek

Interdisciplinary Centre for Mathematical and Computational Modelling
University of Warsaw, Poland


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Federated search in Solr - proposal

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi Jack,

Surely an interesting feature for portals and the like. While there exists separate federation frameworks which also work with Solr, I think a more light-weight approach through Solr's existing API is attractive to many, avoiding yet another layer with new query language etc.

I think a good way to work with the Solr community on this is to create some JIRA issues, one mother issue and sub-issues for each component, then communicate frequently in those and here on the list as you get started, to get early feedback - as opposed to dumping a final solution in one go.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 10. mai 2012, at 18:31, Jacek Plebanek wrote:

> Hello,
> 
> I'm starting to work on federated search algorithms for my PhD study.
> I'll use Solr to implement them (Since I have two years experience with
> Solr at my work).
> 
> I thought that at least part of my work could be useful for Solr Project
> and I could contribute some code. I mean specifically the
> components/modifications to add federated search support to Solr.
> 
> By "Federated Search" I mean searching across heterogeneous data sources
> (something different than existing Distributed Search implemented in
> Solr) - to allow Solr to merge results not only from SolrServer
> instances, but also to include results from external sources (eg. search
> engines using different API). The use case would look like this:
> - user sends the request to Solr (eg. SearchRequest)
> - Solr handles the request internally and/or sends it to other Solr
> instances (current Distributed Search) AND sends it to specified
> external data sources using dedicated adapters.
> - Solr merges the results from Solr instances with results from external
> collections and returns the combined results to user.
> 
> To perform this scenario the four common federated search parts should
> be supported:
> - collection representation (external collections probably won't provide
> the same informations as Solr, like tf-idf)
> - collection selection (predict which collections may return relevant
> results and transfer the search request only to them)
> - result merging (merge results based on more limited informations than
> Solr provides)
> - external sources connection (common API to write custom collections
> adapters)
> 
> I thought I would write some federated search components - schema to
> allow developers to implement custom algorithms/plugins for each part of
> federated search scenario.
> 
> 
> What do You think about that?
> 
> 
> Sorry for my English :)
> 
> Jacek Plebanek
> 
> Interdisciplinary Centre for Mathematical and Computational Modelling
> University of Warsaw, Poland
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org