You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Vikas Saurabh <vi...@gmail.com> on 2015/10/24 19:49:02 UTC

[DISCUSS] Allow custom extension point to lucene indexing and query

Hi,

In our application, we have a requirement where we want to introduce extra
fields into indexed lucene documents (which customized boost to assist in
correct scoring) and then modify the query that gets sent to lucene to use
those inserted fields.

The requirement doesn't quite fit to be included in Oak in general, but it
seems that we can have extension points during indexing and querying which
can hooked into to serve a custom application requirement.

Following is a proposal for such extension (I've a few changes which
implement a basic version... I'd be opening issue and attaching patch to it
soon).

The idea of the extension is very similar to custom scoring extension point
we already have. For my application, we just need to hook into full text
querying, so the proposal is limited to that. It can certainly be extended
later - but, let's start simple :).
We can have a SPI (let's call it IndexAugmentor for now) which has methods
like:
* String getName();
* Collection<Filed> getAugmentedFields(String path, NodeState
indexedNodeState, NodeState indexDefnState);
* Query getCustomQuery(String fullTextTerm, Analyzer

The string returned by getName() identifies a particular implementation of
SPI. An index definition can declare the augmentor implementation to be
used according to this.

getAugmentedFields(...) is given the nodeState being indexed (along with
index def if implementation want to utilize it) and the implementation is
supposed to return a collection of lucene field objects that would need to
added to the document that's inserted into lucene.

getCustomQuery(...) is to allow the implementation to give an extra query
per full text query term. This returned query would be added to generated
query for the fulltext term with a Boolean.SHOULD i.e. custom query would
always give more results that were already being available (along with, of
course, utilizing custom boost as inserted during index time)

Currently, I've added call-back to getAugmentedFields in
LuceneIndexEditory.makeDocument - which seems like the most obvious place
to do it.

For querying though, there are a couple of choices. I've added the callback
in LucenePropertyIndex.tokenToQuery when the global full text query is
being prepared (not tied to a field).

As already said, I'd be opening issue/tasks and attach patches to it. I'd
post those number to this thread. It'd great to have some feedback on the
idea and if it makes to have such extension.

Thanks,
Vikas