You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Julio Castillo <jc...@edgenuity.com> on 2008/05/02 00:38:35 UTC

Jackrabbit overlapping w/ Lucene-Solr

Lucene/Solr
Stores and indexes documents and of course provides search functionality.

Isn't there a bit of an overlap?

Julio Castillo
Edgenuity Inc.


Re: Jackrabbit overlapping w/ Lucene-Solr

Posted by Thomas Müller <th...@day.com>.
Hi,

> Lucene/Solr
>  Isn't there a bit of an overlap?

Does Lucene/Solr implement the JCR API?

Regards,
Thomas

RE: Jackrabbit overlapping w/ Lucene-Solr

Posted by Ard Schrijvers <a....@onehippo.com>.
Hello,

> > ... Isn't there a bit of an overlap?...
> 
> Yes, when it comes to indexing and searching.
> 
> Jackrabbit and Solr both base their indexing stuff on Lucene, 
> and add some features on top of it.
> 
> That set of additional features could probably be (at least 
> partially) factored out and moved to Lucene as extensions 
> that would be used by both projects, but there are also 
> significant differences due to the way indexes are used by 
> Solr (as a core functionality) and Jackrabbit (as one module 
> that's more tightly integrated with the storage features). So 
> that's probably not as trivial as it might seem.

Exactly, and more about it: first of all, you should not compare
(Lucene/Solr) and Jackrabbit indexing. If you want to compare, you
should compare Solr with Jackrabbit, because they both use the Lucene
search engine. And, there are important differences between the indexing
*requirements* for Solr and Jackrabbit. Solr can update many documents
in the background, warming up an updated index in the background, and
then, every X seconds (minutes) replace the currently used indexSearcher
with the newly pre-warmed indexSearcher. This is clearly not really
live. 

Jackrabbit OTOH, needs to be up2date with its index(es) always! After a
session.save(), all changes has to be accounted for in the index(es),
and every search result needs to reflect these changes instantly (to be
precise, the indexing is queued, but needs to be finished when a search
request is done, therefor the request gets blocked untill indexing is
done). Also, you need to realize that searches involving hierarchical
queries (ie, starting with some path) are resolved within the index!
IMHO, Jackrabbit requirements are way harder and much more complex then
the ones from Solr. Also, IMHO, I think Jackrabbit indexing structure
has a better choosen technique for fast incremental updating then Solr.
I talked to some Lucene and Solr committers last ApacheCon and they were
quite interested in this Jackrabbit architecture (in short: Jackrabbit
is not using one single lucene index, but has many indexes, behaving
similar to the segments of lucene within one index, see [1]). 

So, recapitulating, no, there is totally no overlap between Solr and
Jackrabbit indexing. Currently, there is a little overlap between
Jackrabbit and Lucene's latest version, because Lucene added some
functionality that was already partly added in Jackrabbit indexing, and
now has some imcompatibility.

-Ard

[1] http://jackrabbit.apache.org/index-readers.html

> 
> -Bertrand
> 

Re: Jackrabbit overlapping w/ Lucene-Solr

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Fri, May 2, 2008 at 12:38 AM, Julio Castillo <jc...@edgenuity.com> wrote:
> ...Lucene/Solr
>  Stores and indexes documents and of course provides search functionality....

Solr is meant to store the indexable parts of documents. You can of
course store anything, but there's no versioning, no locks, etc., the
focus is really on indexing, not storage.

>
> ... Isn't there a bit of an overlap?...

Yes, when it comes to indexing and searching.

Jackrabbit and Solr both base their indexing stuff on Lucene, and add
some features on top of it.

That set of additional features could probably be (at least partially)
factored out and moved to Lucene as extensions that would be used by
both projects, but there are also significant differences due to the
way indexes are used by Solr (as a core functionality) and Jackrabbit
(as one module that's more tightly integrated with the storage
features). So that's probably not as trivial as it might seem.

-Bertrand