You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Christian Reuschling <re...@dfki.uni-kl.de> on 2014/06/18 17:10:14 UTC

searching multiple remote indices

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
we currently migrate from Lucene 3.5.0 to Lucene 4. So far so good, but in one project we have the
need to access multiple indices, that can be also remote ones. In the past, we solved this by
using the Searcher interface, and implemented a subclass of it that makes remote calls to some
according server instance. With MultiSearcher, it was easy to mix local and remote indices then,
having one transparent Searcher instance that enables distributed search.

In Lucene 4, Searcher and MultiSearcher are removed now. The recommended solution for this is to
use MultiIndexReader to aggregate the indices, and build an IndexSearcher out of this. This is
fine for local indices, but we are wondering about how to proceed with remote ones.
Subclassing IndexSearcher is no solution anymore, since there is no MultiSearcher for aggregation.
Subclassing IndexReader would maybe work, but there are some methods declared as final
(document(int), etc), and we are not sure if this can work only with the non-final methods.
Further, we are not sure if there will be performance issues with remote IndexReader proxy
objects, because potentially there must be transported a plenty of information over the wire
during a search - even more as by aggregating search results of e.g. length 20.

Another idea we have is to implement a remote call Directory subclass. But still, we are not sure
if this is a feasible way to do. This would solve the final method problem, but has maybe similar
performance issues, if this is critical.

Because we are in a migration process and don't implement something from scratch, also some use of
different techniques as switching to a Solr backend are not a way to go for us. On the other hand,
maybe there are some Solr classes for distributed search we could also use instead of MultiSearcher.

Simply aggregate the search result lists and write an own simple class with a search(..) method is
also not enough, since we use some more searcher functionality, which also have to be aggregated
then, namely:
- - createNormalizedWeight(query) - called by someQuery.weight(searcher)
- - rewrite(query) - to get the atom queries. Was implemented with query.combine(query) in
MultiSearcher which is also not available anymore.

Does somebody have some best practices? From our impression, it sounds not like an exotic case. Or
is it?

Thanks from the whole DFKI Lucene crew!

Christian

- --
______________________________________________________________________________
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer

Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Trippstadter Straße 122, D-67663 Kaiserslautern, Germany

Phone: +49.631.20575-1250
mailto:reuschling@dfki.de http://www.dfki.uni-kl.de/~reuschling/

- ------------Legal Company Information Required by German Law------------------
Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313=
______________________________________________________________________________
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlOhq9YACgkQ6EqMXq+WZg+SrwCfckWZIfyysjxWSTRY3WQN/MeG
blcAoIQsQFJ5zb/9DMjUIYf/tidEaoJ3
=xqVj
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org