You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Elena Demidova <de...@l3s.de> on 2006/08/11 12:59:42 UTC
Remote searches with Lucene
Dear All,
The application I am working on is intended to make use of the
distributed search capabilities of the Lucene library. While trying to
work with the Lucene’s RemoteSearchable class, I faced some problems
cased by the current Lucene implementation. In following I’ll try to
describe them, as well as the possible ways of their solution, I
identified. The most important question for me is, if these changes have
a chance to be integrated in the coming Lucene versions, such that
remote searches would really become feasible. I would appreciate any
feedback.
Best wishes,
Elena Demidova
Now to the problems themselves:
1. Architecture issue
The first problem concerns the construction of the RemoteSearchable
object. .Net framework allows for both, server and client activation
models of the remote objects. Currently, RemoteSearchable class
possesses only one constructor that requires knowledge of a local
Searchable object:
public RemoteSearchable(Lucene.Net.Search.Searchable local)
Since this “local” object is located on the server, knowledge of the
server’s index paths is needed for its creation. However, there are at
least some scenarios where only the server, but not the client, knows
where the indexes are stored on the server side. I think this problem
could be solved by extending RemoteSearchable class with a standard
constructor that reads the names of the indexes to be published out of a
configuration file on the server side.
2. Bug in Term construction
Another problem occurs as you try to perform a function call of a
RemoteSearchable object. The only function which really works correctly
is the MaxDoc() function. If you ask, for instance, for the document
frequency using DocFreq(new Term(“field”,”value”)), you’ll always get
“0” out of it. The reason for that is that all values, that are passed
as arguments (and return values) for the remote calls need to be
correctly serialized. For DocFreq function this argument is the Term
object, which can not be correctly reconstructed on the server side. The
constructor of the Term object performs additional “intern”-operation on
the field names, which is not called during the default serialization.
Thus the field names contained in the reconstructed Term object are not
comparable with those in the index.
This problem can be solved by overloading of the serialization procedure
for the objects of the Term class. In order to do that, Term class
should be derived from the ISerializable interface and overload its
serialization function "GetObjectData". The class itself need to store
the “intern” value passed to its constructor, since this knowledge is
required for the correct reconstruction of the object. Function
GetObjectData describes then how the object is serialized. Additional
deserialization constructor allows then for the correct reconstruction
of the object. The both operations are called automatically during the
remote call execution. In following the necessary code changes in the
Term class are presented:
//add derivation from the ISerializable interface
[Serializable()]
public sealed class Term : System.IComparable, ISerializable
…
//store the object’s intern value needed by the constructor
private bool intern;
internal Term(System.String fld, System.String txt, bool intern)
{
…
//store the object’s intern value
this.intern=intern;
}
//Serialization function
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("field", field);
info.AddValue("text", text);
info.AddValue("intern",intern);
}
//Deserialization constructor.
public Term(SerializationInfo info, StreamingContext ctxt)
{
String fld=(String)info.GetValue("field", typeof(String));
this.intern=(bool)info.GetValue("intern", typeof(bool));
this.field = intern ? String.Intern(fld) : fld;
this.text = (String)info.GetValue("text", typeof(String));
}