You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Atul Kumar Tripathi <At...@virtusa.com> on 2010/11/29 06:06:31 UTC

Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Hi Guys,

Is this possible to store indexes in database using jackrabbit 2.1.2?

Are there any plans to provide functionality to store indexes in
database with upcoming Jackrabbit Releases?

Thanks in advance.

Thanks & Regards.

Atul Tripathi

Senior Engineer | Datacert/Technology.

Virtusa India Pvt. Ltd.

Chennai ATC

Phone: +91 44 66127000 Ext: 3742 | Mobile: +91 9940483180

<http://www.virtusa.com/> <http://www.virtusa.com/blog/>
<https://twitter.com/VirtusaCorp>
<http://www.linkedin.com/companies/virtusa>
<http://www.facebook.com/VirtusaCorp>
<http://www.youtube.com/virtusacorporation>
<http://www.flickr.com/photos/virtusa>

Virtusa was recently ranked and featured in 2010 Deloitte Technology Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100 sub-list and 2010 FinTech 100 among others.

---------------------------------------------------------------------------------------------

This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is intended for the addressee only. Any unauthorized disclosure, use, dissemination, copying, or distribution of this message or any of its attachments or the information contained in this e-mail, or the taking of any action based on it, is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail and delete this message.

---------------------------------------------------------------------------------------------

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 6:06 AM, Atul Kumar Tripathi
<At...@virtusa.com> wrote:
>
> Hi Guys,
>
> Is this possible to store indexes in database using jackrabbit 2.1.2?

The short answer: No, this is not possible

>
> Are there any plans to provide functionality to store indexes in database with upcoming Jackrabbit Releases?

Perhaps in the long run: Note that fist of all, Jackrabbit developers
have tried the database approach before. However, imo, Lucene cannot
perform when being stored in a database (unless you keep it small, and
read it completely memory: I know some hibernate guys combining it
with infinispan for replication store Lucene segments in database.
That would also be promising for jr. This is however something
different than 'storing
lucene in a database' , as they only store the segments and keep the
index in memory )

Ard

>
> Thanks in advance.
>
>
>
> Thanks & Regards.
>
> Atul Tripathi
>
> Senior Engineer  |  Datacert/Technology.
>
> Virtusa India Pvt. Ltd.
>
> Chennai ATC
>
> Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
>
>
>
>
>
> Virtusa was recently ranked and featured in 2010 Deloitte Technology Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100 sub-list and 2010 FinTech 100 among others.
>
> ---------------------------------------------------------------------------------------------
>
> This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is intended for the addressee only. Any unauthorized disclosure, use, dissemination, copying, or distribution of this message or any of its attachments or the information contained in this e-mail, or the taking of any action based on it, is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail and delete this message.
>
> ---------------------------------------------------------------------------------------------


--
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 2:19 PM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> On 29.11.10 13:21, "Ard Schrijvers" <a....@onehippo.com> wrote:
>>And this is a big burden! I think, we could have a single big index
>>for the JCR spec implementation. But, I wouldn't solve this by having
>>more small indexes, as collections. I would like to have an option, in
>>case of XPath, like 'simpleXPath=true' where we limit some of the
>>options: In other words, not all the jcr spec queries are available,
>>but it is efficient and fast (we at Hippo limit ourselves to only
>>efficient xpath queries). If you do not by default store all
>>properties, and do not have to support complex path constraint (only
>>simple ones), then, you wouldn't have to bother that much about one
>>single Lucene index.
>
> As written in my other mail, there are good reasons to allow for separate
> indexes, to resolve conflicts of different indexing needs for different
> applications. Maybe this is only true for the (node-scoped) full text
> index, where you can't exclude certain properties at query time.
>
> And the big advantage of those collections is that you solve the path
> constraint issue, at least for those queries like:
>
> /content/siteA//*[jcr:contains(., 'term') and @myProp='foo']
>
> because you would have a collection for /content/siteA, /content/siteB,
> etc. with just the right full text / property index.

We achieved this much easier and more flexible, as we have the
*demand* for instant path constraint on any path as well. A little
background first: Jackrabbit has a very nice feature, that jcr nodes
are not aware of their actual location. Only parent and childs are
know. This also holds for the index. This means, that moving a tree
with thousands of nodes is a single node change, both in dbase as in
index.

However, this comes at a price of slow path constraint queries. This
was unacceptable for us. Hence, for a node, we index all parent
elements in a multivalued Lucene field as well. Suppose my location
is: /content/document/news/2009/12/foo . My Lucene field will have the
terms:

 /content
 /content/document
 /content/document/news
 /content/document/news/2009
 /content/document/news/2009/12

So, *any* simple path constraint in our repository, is just matching a
single lucene term, which is instant. Give me all nodes below '
/content/document/news' are just all the nodes that have the term '
/content/document/news' in our predefined Lucene field (note that we
actually use node ids for it, but for the picture, this is easier to
understand)

>
>>Lucene 4.0 will be so blistering fast and efficient...
>
> Cool.
>
>>the figures we
>>need to index with Jackrabbit is peanuts for Lucene. *If* we improve
>>indexing, a couple of hundreds of millions of nodes is a no-brainer!
>
> With the exception of the path constrained, as this is not indexed. Maybe
> it will be easier with Lucene 4.0 to index the path, especially allow for
> fast updates of the path property when something is moved?

Lucene will hardly have improvements for hierarchical structures. Note
that this is exactly what makes jcr indexing so complex: The
hierarchy! For small hierarchies, more on Document kind of level,
there might be added a NestedDocumentQuery: This is to avoid cross
matching see [1]. But this is very simple compared to what Jackrabbit
can do with xpath, and it is still in development

>>We should not be thinking about problems that are a result of the
>>current implementation and its short comings (they are a result that
>>it needed to work against Lucene 1.4, this is no critics to be sure!).
>
> Ok.
>
>>asynchronous indexing is already part of the jcr 283 afaik and is
>>allowed, certainly for binary content
>
> Sure, but still indexing takes a major part of a save() call, AFAIK.

True...and the more important that just one node in a cluster does the
actual indexing (or extraction like from pdf, even more important!)

Regards Ard

[1] https://issues.apache.org/jira/browse/LUCENE-2454

> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>

-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 13:21, "Ard Schrijvers" <a....@onehippo.com> wrote:
>And this is a big burden! I think, we could have a single big index
>for the JCR spec implementation. But, I wouldn't solve this by having
>more small indexes, as collections. I would like to have an option, in
>case of XPath, like 'simpleXPath=true' where we limit some of the
>options: In other words, not all the jcr spec queries are available,
>but it is efficient and fast (we at Hippo limit ourselves to only
>efficient xpath queries). If you do not by default store all
>properties, and do not have to support complex path constraint (only
>simple ones), then, you wouldn't have to bother that much about one
>single Lucene index.

As written in my other mail, there are good reasons to allow for separate
indexes, to resolve conflicts of different indexing needs for different
applications. Maybe this is only true for the (node-scoped) full text
index, where you can't exclude certain properties at query time.

And the big advantage of those collections is that you solve the path
constraint issue, at least for those queries like:

/content/siteA//*[jcr:contains(., 'term') and @myProp='foo']

because you would have a collection for /content/siteA, /content/siteB,
etc. with just the right full text / property index.

>Lucene 4.0 will be so blistering fast and efficient...

Cool.

>the figures we
>need to index with Jackrabbit is peanuts for Lucene. *If* we improve
>indexing, a couple of hundreds of millions of nodes is a no-brainer!

With the exception of the path constrained, as this is not indexed. Maybe
it will be easier with Lucene 4.0 to index the path, especially allow for
fast updates of the path property when something is moved?

>We should not be thinking about problems that are a result of the
>current implementation and its short comings (they are a result that
>it needed to work against Lucene 1.4, this is no critics to be sure!).

Ok.

>asynchronous indexing is already part of the jcr 283 afaik and is
>allowed, certainly for binary content

Sure, but still indexing takes a major part of a save() call, AFAIK.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 4:48 PM, Jukka Zitting <jz...@adobe.com> wrote:
> Hi,
>
> Great discussion! I've recently started looking at ways to implement the
> "search collection" stuff Alex referred to, and it would be great if this
> discussion could end up helping design such an implementation. We should
> move to dev@ for the details.

I am really not in favour. I anticipate lot's of issues, and
home-grown solutions build on top of Lucene to get it all working. We
should strive for simplicity imo, and go 'back' to as close to Lucene
as possible. Regarding 'search collections', what is the next step
after the 'path based' collections? ACL-based collections?  I don't
believe it will ever work, and will add tons of complexity.

>
> PS. Regarding index-in-JCR, nowadays with a file-based data store a Lucene
> search index stored and accessed as JCR nodes and properties should be
> pretty efficient...

No, it can't be efficient. Lucene is highly optimized for storage on
filesystem (or memory). You'll loose this when accessing it through
jcr nodes. I just don't see how it could ever perform. Every attempt I
have seen to store Lucene differently than on FileSystem or in memory
fails very soon on scalability. Also, storage in Lucene 4.0 completely
changes. This would mean, that the 'jcr' mapped indexes would again
need to change. We shouldn't go that route, and, it is not needed imo!
Strive for simplicity and close to plain Lucene library usage and as
few as possible own classes I strongly prefer

Regards Ard

>
> BR,
>
> Jukka Zitting
>

-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 16:48, "Jukka Zitting" <jz...@adobe.com> wrote:
>Great discussion! I've recently started looking at ways to implement the
>"search collection" stuff Alex referred to

BTW, this idea comes from David, of course ;-)

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Jukka Zitting <jz...@adobe.com>.

Hi,

Great discussion! I've recently started looking at ways to implement the 
"search collection" stuff Alex referred to, and it would be great if 
this discussion could end up helping design such an implementation. We 
should move to dev@ for the details.

On 29/11/10 14:21, Ard Schrijvers wrote:
> For example mandatad behaviour according spec:
>
> //*[@jcr:contains(.,'jcr')]
>
> must return *all* nodes in the current workspace that contain jcr.
> This effectively mandates that the entire thing can be searched as a
> 'single' index.

By the spec the above query could just as well be implemented by a tree 
traversal, just like the SQL standard makes no demands on where and how 
RDBMs uses indexes.

What I'd like to see is an equivalent of the CREATE INDEX statement that 
could be used to add JCR indexes that selectively boost the performance 
of the kinds of searches that a specific deployment is relying on.

PS. Regarding index-in-JCR, nowadays with a file-based data store a 
Lucene search index stored and accessed as JCR nodes and properties 
should be pretty efficient...

BR,

Jukka Zitting

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 2:12 PM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> On 29.11.10 13:25, "Thomas Mueller" <mu...@adobe.com> wrote:
>>>>the single-big index for the
>>>> entire repository that is mandated by the JCR spec.
>>
>>Sorry, where in the spec is this mandated?

Of course it is not mandated as such, but, the mandated behaviour
makes the indexing very inefficient (at least, the default where you
do not configure some custom indexing). As a matter of fact,
jackrabbit consists of *many* smaller and bigger indexes, but as the
emergent behaviour, it could be considered as a single one (one
multiIndexReader). For example mandatad behaviour according spec:

//*[@jcr:contains(.,'jcr')]

must return *all* nodes in the current workspace that contain jcr.
This effectively mandates that the entire thing can be searched as a
'single' index.

//*[@myproject:title='foo'] order by @myproject:summary

This mandates that title and summary *must* be stored Lucene values:
This makes a Lucene index very big.

>
> Not sure if it is actually mandated, but applications expect it and you
> might get conflicts with multiple applications.

Exactly.

>
> If one application does not want a certain property to be indexed (because
> it does not want to find it in e.g. full text search), there might be
> another one who needs it, so you can't configure the right index if you
> only have a single one.

I would like to opt for an option 'simpleIndex' which just indexes the
things that

1) Are efficient to query
2) Are not blowing up the index in size

My current implementation I can always avoid the complex queries

Ard

>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>

-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 17:07, "Thomas Mueller" <mu...@adobe.com> wrote:
>What do you mean with "you only have one index from the JCR query API
>point of view"?
>
>Relational databases support multiple indexes, and they choose the right
>(well, hopefully) index / indexes depending on the query. I don't see how
>Jackrabbit is not "allowed" to do that because of the query API.

Ok, yes, you could automatically aggregate multiple index as needed, but
the example given by Ard

//*[jcr:contains(*, 'term')]

shows that this is difficult to implement and most likely much slower than
a dedicated index.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>What I wanted to say is that the specification indirectly mandates it.
>AFAIK there is no statement that all properties have to be indexed, but
>since you only have one index from the JCR query API point of view, any
>repository-implementation-specific configuration or choice on how to index
>does not work for multiple applications. Hence indirectly everything must
>be indexed by a (useful) implementation OOTB.

What do you mean with "you only have one index from the JCR query API
point of view"?

Relational databases support multiple indexes, and they choose the right
(well, hopefully) index / indexes depending on the query. I don't see how
Jackrabbit is not "allowed" to do that because of the query API.

Regards,
Thomas

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 14:21, "Stefan Guggisberg" <st...@gmail.com> wrote:

>On Mon, Nov 29, 2010 at 2:12 PM, Alexander Klimetschek
><ak...@adobe.com> wrote:
>> On 29.11.10 13:25, "Thomas Mueller" <mu...@adobe.com> wrote:
>>>>>the single-big index for the
>>>>> entire repository that is mandated by the JCR spec.
>>>
>>>Sorry, where in the spec is this mandated?
>>
>> Not sure if it is actually mandated,
>
>if you're not sure then please refrain from making such
>public statements. they are not very helpful and only
>cause confusion.

What I wanted to say is that the specification indirectly mandates it.
AFAIK there is no statement that all properties have to be indexed, but
since you only have one index from the JCR query API point of view, any
repository-implementation-specific configuration or choice on how to index
does not work for multiple applications. Hence indirectly everything must
be indexed by a (useful) implementation OOTB.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Stefan Guggisberg <st...@gmail.com>.

On Mon, Nov 29, 2010 at 2:12 PM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> On 29.11.10 13:25, "Thomas Mueller" <mu...@adobe.com> wrote:
>>>>the single-big index for the
>>>> entire repository that is mandated by the JCR spec.
>>
>>Sorry, where in the spec is this mandated?
>
> Not sure if it is actually mandated,

if you're not sure then please refrain from making such
public statements. they are not very helpful and only
cause confusion.

cheers
stefan

> but applications expect it and you
> might get conflicts with multiple applications.
>
> If one application does not want a certain property to be indexed (because
> it does not want to find it in e.g. full text search), there might be
> another one who needs it, so you can't configure the right index if you
> only have a single one.
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 13:25, "Thomas Mueller" <mu...@adobe.com> wrote:
>>>the single-big index for the
>>> entire repository that is mandated by the JCR spec.
>
>Sorry, where in the spec is this mandated?

Not sure if it is actually mandated, but applications expect it and you
might get conflicts with multiple applications.

If one application does not want a certain property to be indexed (because
it does not want to find it in e.g. full text search), there might be
another one who needs it, so you can't configure the right index if you
only have a single one.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>>the single-big index for the
>> entire repository that is mandated by the JCR spec.

Sorry, where in the spec is this mandated?

Regards,
Thomas

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 1:06 PM, Alexander Klimetschek
<ak...@adobe.com> wrote:

>>The only drawback is that the current jr lucene impl does not fit the
>>InfinispanDirectory (infinispan lucene dir). It is because of the
>>multi-index and never re-open setup in jr: It was state of the art
>>against lucene 1.4, but now mostly redundant.
>
> Just one node doing the indexing sounds interesting. But I would then
> think we store the index inside the repository (as a randomly-accessible
> binary), so that you can use any persistence manager and the
> implementation is simpler (no need to adapt to the various databases).

You are completely right! Good point.. :-)

>
> We had some plans to do something like this with additional indexes
> (calling them "collections") that are created by the application side, but
> store inside the repository. And implemented by Lucene (especially for the
> full-text part).

Hmmm... personally, I wouldn't go this route. I think you next line
covers more my thing:

>
> The idea here is to overcome the problem of the single-big index for the
> entire repository that is mandated by the JCR spec. You often want indexes

And this is a big burden! I think, we could have a single big index
for the JCR spec implementation. But, I wouldn't solve this by having
more small indexes, as collections. I would like to have an option, in
case of XPath, like 'simpleXPath=true' where we limit some of the
options: In other words, not all the jcr spec queries are available,
but it is efficient and fast (we at Hippo limit ourselves to only
efficient xpath queries). If you do not by default store all
properties, and do not have to support complex path constraint (only
simple ones), then, you wouldn't have to bother that much about one
single Lucene index.

Lucene 4.0 will be so blistering fast and efficient...the figures we
need to index with Jackrabbit is peanuts for Lucene. *If* we improve
indexing, a couple of hundreds of millions of nodes is a no-brainer!
We should not be thinking about problems that are a result of the
current implementation and its short comings (they are a result that
it needed to work against Lucene 1.4, this is no critics to be sure!).

> that are only for part of a repository (e.g. /content/siteA) and are
> asynchronous (not blocking other repository writes) and can be more easily
> thrown away, updated etc. without breaking core repository functionality.

asynchronous indexing is already part of the jcr 283 afaik and is
allowed, certainly for binary content

>
>>Anyway, in due time we need to pick this up at the dev list
>
> Of course.

To be continued :-)

Regards Ard

>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>

-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 1:49 PM, Ian Boston <ie...@tfd.co.uk> wrote:
>
> On 29 Nov 2010, at 12:06, Alexander Klimetschek wrote:
>
>> (as a randomly-accessible
>> binary)
>
> One of the reasons the JDBCDirectory is not fast is that most DBs dont support seek on blobs, and anyway, anything that is shared over a network is just too slow, unless a local cached version of the index is made available. I think thats why the Infinispan Directory does work. BTW, iirc you can configure infinispan to page its cache to disk.

Indeed. Lucene needs so many random seeks, that the only (in my view)
efficient way is to have it on local disk. Lucene 4.0 even removes
many internal caches (like FieldCache!!!) and relies completely on
file system caches. This will actually make things like sorting on
tens of millions of titles possible without going OOM.

I didn't look at infinispan yet code wise, but of course they have a
way to flush the memory to disk, or, to database. We might add
flushing to jcr, which would make the lucene segments be flushed into
the repository (as Alexander earlier pointed out)

>
> I tired a number of impls of remote shared Lucene indexes when I was writing the search engine for Sakai 2, all failed. The only solution that worked was one where lucene was allowed to perform seeks on local disk or in memory. (documents were indexed on one node in the cluster (round robin), the indexing nodes ship segments updates, and all nodes search on local indexes but not real time as Jackrabbit is)

Yes. I recently had some talks with Simon Willnauer, one of the very
few Lucene committers that know how the low-level persistence and read
works: Lucene cannot perform other then FS or in memory. The other day
I attended a talk about Lucandra at Atlante Apachecon: Lucene in a
distributed Cassandra ring...they hit performance penalties after
100.000 lucene docs...well, it is just not possible (or I am too
stupid) :-)

Cheers Ard

>
> Ian
>
>
>

-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ian Boston <ie...@tfd.co.uk>.

On 29 Nov 2010, at 12:06, Alexander Klimetschek wrote:

> (as a randomly-accessible
> binary)

One of the reasons the JDBCDirectory is not fast is that most DBs dont support seek on blobs, and anyway, anything that is shared over a network is just too slow, unless a local cached version of the index is made available. I think thats why the Infinispan Directory does work. BTW, iirc you can configure infinispan to page its cache to disk.

I tired a number of impls of remote shared Lucene indexes when I was writing the search engine for Sakai 2, all failed. The only solution that worked was one where lucene was allowed to perform seeks on local disk or in memory. (documents were indexed on one node in the cluster (round robin), the indexing nodes ship segments updates, and all nodes search on local indexes but not real time as Jackrabbit is)

Ian

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 12:54, "Ard Schrijvers" <a....@onehippo.com> wrote:

>On Mon, Nov 29, 2010 at 11:26 AM, Alexander Klimetschek
><ak...@adobe.com> wrote:
>> But what is the use-case for this? Why store a full-text index
>> implementation that is totally unrelated to the DB inside a database
>>layer
>> that just makes it perform worse, use more disk-space, etc.? It's like
>> implementing a database index by storing it in another database...
>
>Exactly! But you miss one crucial thing: First of all, the Lucene
>index should be tens of times smaller than it currently is. This is
>possible if we make it better configurable. Secondly, performance
>isn't worse, as the entire Lucene indexes are kept in memory. But the
>crucial part is in scalability: In a clustered setup, you can with
>infinispan (formerly jboss cache) have a replicated in memory Lucene
>index. This means, only one node in the cluster needs to do the
>indexing. The other nodes get it replicated. Now, because it is all in
>memory, 2 or 3 cluster nodes can for example be assigned to now and
>then flush their (new) in memory segments to a database: This is just
>a 'backup' for when the entire cluster goes down. It is not used by
>Lucene, only for bootstrapping when starting the cluster. So, this
>scenario does add lots of potential; Bringing in a new node in the
>cluster is instant. Hibernate with very similar needs as jackrabbit
>uses this technique I just described.
>
>As a bonus we might get rid of the database persisted changelog (or
>how it is called): This is meant for nodes in a cluster to
>a) Evict their caches
>b) Index new nodes
>
>(b) is not needed any more as we have index replication.
>(a) could be replaced by jms which seems more natural to me.
>
>The only drawback is that the current jr lucene impl does not fit the
>InfinispanDirectory (infinispan lucene dir). It is because of the
>multi-index and never re-open setup in jr: It was state of the art
>against lucene 1.4, but now mostly redundant.

Just one node doing the indexing sounds interesting. But I would then
think we store the index inside the repository (as a randomly-accessible
binary), so that you can use any persistence manager and the
implementation is simpler (no need to adapt to the various databases).

We had some plans to do something like this with additional indexes
(calling them "collections") that are created by the application side, but
store inside the repository. And implemented by Lucene (especially for the
full-text part).

The idea here is to overcome the problem of the single-big index for the
entire repository that is mandated by the JCR spec. You often want indexes
that are only for part of a repository (e.g. /content/siteA) and are
asynchronous (not blocking other repository writes) and can be more easily
thrown away, updated etc. without breaking core repository functionality.

>Anyway, in due time we need to pick this up at the dev list

Of course.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 11:26 AM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> But what is the use-case for this? Why store a full-text index
> implementation that is totally unrelated to the DB inside a database layer
> that just makes it perform worse, use more disk-space, etc.? It's like
> implementing a database index by storing it in another database...

Exactly! But you miss one crucial thing: First of all, the Lucene
index should be tens of times smaller than it currently is. This is
possible if we make it better configurable. Secondly, performance
isn't worse, as the entire Lucene indexes are kept in memory. But the
crucial part is in scalability: In a clustered setup, you can with
infinispan (formerly jboss cache) have a replicated in memory Lucene
index. This means, only one node in the cluster needs to do the
indexing. The other nodes get it replicated. Now, because it is all in
memory, 2 or 3 cluster nodes can for example be assigned to now and
then flush their (new) in memory segments to a database: This is just
a 'backup' for when the entire cluster goes down. It is not used by
Lucene, only for bootstrapping when starting the cluster. So, this
scenario does add lots of potential; Bringing in a new node in the
cluster is instant. Hibernate with very similar needs as jackrabbit
uses this technique I just described.

As a bonus we might get rid of the database persisted changelog (or
how it is called): This is meant for nodes in a cluster to
a) Evict their caches
b) Index new nodes

(b) is not needed any more as we have index replication.
(a) could be replaced by jms which seems more natural to me.

The only drawback is that the current jr lucene impl does not fit the
InfinispanDirectory (infinispan lucene dir). It is because of the
multi-index and never re-open setup in jr: It was state of the art
against lucene 1.4, but now mostly redundant.

Anyway, in due time we need to pick this up at the dev list

Regards Ard

>
> Regards,
> Alex
>
> On 29.11.10 09:37, "Ard Schrijvers" <a....@onehippo.com> wrote:
>
>>On Mon, Nov 29, 2010 at 9:21 AM, Thomas Mueller <mu...@adobe.com> wrote:
>>> Hi,
>>>
>>> Jackrabbit currently uses Lucene. According to the Lucene FAQ it should
>>>be possible, but I'm not sure what changes (if any) would be required in
>>>Jackrabbit:
>>
>>Most likely only a database based LuceneDirectory would have to be
>>created. Although the FAQ states that it is possible, I don't think it
>>can ever perform. Once again, I believe in storing segments as a whole
>>in a database to have a persisted lucene index, but, you cannot
>>efficiently access these segments in a database. You then need to keep
>>the index in memory.
>>
>>>
>>>
>>>http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index
>>>_in_a_relational_database.3F
>>>
>>> I'm not aware of plans to support this feature. Patches are welcome of
>>>course :-)
>>
>>First of all, we would have to do some housekeeping in the existing
>>implementation. The entire 'multi-index' setup, meant for 'near real
>>time searches' is unneeded anymore against newer lucene versions. Also
>>the way properties are indexed should be replaced. Note that these
>>changes will most likely already affect about 100 of the 200 query
>>classes (where quite a lot can be removed as well). So, this is lots
>>of work. After this, preferably, we can move to the 4.x Lucene
>>versions. So, a patch is welcome, but don't think to light about it :)
>>
>>Regards Ard
>>
>>>
>>> Regards,
>>> Thomas
>>>
>>> From: Atul Kumar Tripathi
>>><At...@virtusa.com>>
>>> Reply-To:
>>>"users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>"
>>><us...@jackrabbit.apache.org>>
>>> Date: Mon, 29 Nov 2010 05:06:31 +0000
>>> To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>"
>>><us...@jackrabbit.apache.org>>
>>> Subject: Functionality to store indexes in database with jackrabbit
>>>2.1.2 or upcoming releases.........
>>>
>>>
>>> Hi Guys,
>>>
>>> Is this possible to store indexes in database using jackrabbit 2.1.2?
>>>
>>> Are there any plans to provide functionality to store indexes in
>>>database with upcoming Jackrabbit Releases?
>>>
>>> Thanks in advance.
>>>
>>> Thanks & Regards.
>>> Atul Tripathi
>>> Senior Engineer  |  Datacert/Technology.
>>> Virtusa India Pvt. Ltd.
>>> Chennai ATC
>>> Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
>>> [cid:image001.jpg@01CB8FB1.496525A0]<http://www.virtusa.com/>
>>>[cid:image002.gif@01CB8FB1.496525A0] <http://www.virtusa.com/blog/>
>>>[cid:image003.gif@01CB8FB1.496525A0] <https://twitter.com/VirtusaCorp>
>>>[cid:image004.gif@01CB8FB1.496525A0]
>>><http://www.linkedin.com/companies/virtusa>
>>>[cid:image005.gif@01CB8FB1.496525A0]
>>><http://www.facebook.com/VirtusaCorp>
>>>[cid:image006.gif@01CB8FB1.496525A0]
>>><http://www.youtube.com/virtusacorporation>
>>>[cid:image007.gif@01CB8FB1.496525A0]
>>><http://www.flickr.com/photos/virtusa>
>>>
>>>
>>> Virtusa was recently ranked and featured in 2010 Deloitte Technology
>>>Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100
>>>sub-list and 2010 FinTech 100 among others.
>>>
>>>
>>>-------------------------------------------------------------------------
>>>--------------------
>>>
>>> This message, including any attachments, contains confidential
>>>information intended for a specific individual and purpose, and is
>>>intended for the addressee only. Any unauthorized disclosure, use,
>>>dissemination, copying, or distribution of this message or any of its
>>>attachments or the information contained in this e-mail, or the taking
>>>of any action based on it, is strictly prohibited. If you are not the
>>>intended recipient, please notify the sender immediately by return
>>>e-mail and delete this message.
>>>
>>>
>>>-------------------------------------------------------------------------
>>>--------------------
>>>
>>>
>>
>>
>>
>>--
>>Hippo
>>Europe  €  Amsterdam  Oosteinde 11  €  1017 WT Amsterdam  €  +31 (0)20
>>522 4466
>>USA  € San Francisco  185 H Street Suite B  €  Petaluma CA 94952-5100
>>€  +1 (707) 773 4646
>>Canada    €   Montréal  5369 Boulevard St-Laurent  €  Montréal QC H2T
>>1S5  €  +1 (514) 316 8966
>>www.onehippo.com  €  www.onehippo.org  €  info@onehippo.com
>>
>
>
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>



-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 12:47 PM, Seidel. Robert <Ro...@aeb.de> wrote:
> I think, the use-case is the same, that we have::
>
> Multi cluster environment - only one single Lucene index within the database for all nodes would
>        -> takes less disk space (if you have more than one node)

exactly!

>        -> make backups easier (you only have to backup the database, not the db and some folders on three nodes)

exactly!

Newer Lucene version have snapshot policy which we could definitely
use! First need upgrading and better replace the multi lucene index
setup

>        -> makes it easier to add nodes (you don't have to wait, until the index is built up)

exactly!

>        -> make the index more stable (because the database is transactional)

Lucene indexes are by itself very stable, but for the record I'll just
say: Exactly :-))

Note that I have all these ideas already for a long time, but I am too
occupied to work on it. I still keep hoping I do have time for it
somewhere next year.

Regards ard

>
> Regards, Robert
>

Re: AW: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 29.11.10 12:47, "Seidel. Robert" <Ro...@aeb.de> wrote:

>I think, the use-case is the same, that we have::
>
>Multi cluster environment - only one single Lucene index within the
>database for all nodes would
>    -> takes less disk space (if you have more than one node)
>    -> make backups easier (you only have to backup the database, not the
>db and some folders on three nodes)
>    -> makes it easier to add nodes (you don't have to wait, until the
>index is built up)
>    -> make the index more stable (because the database is transactional)

Ok, makes sense, but this would be an optimization purely for writes. I
would expect that the benefits are completely offset by the much worse
read performance of this setup.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

AW: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by "Seidel. Robert" <Ro...@aeb.de>.

I think, the use-case is the same, that we have::

Multi cluster environment - only one single Lucene index within the database for all nodes would
	-> takes less disk space (if you have more than one node)
	-> make backups easier (you only have to backup the database, not the db and some folders on three nodes)
	-> makes it easier to add nodes (you don't have to wait, until the index is built up)
	-> make the index more stable (because the database is transactional)

Regards, Robert

-----Ursprüngliche Nachricht-----
Von: Alexander Klimetschek [mailto:aklimets@adobe.com] 
Gesendet: Montag, 29. November 2010 11:26
An: users@jackrabbit.apache.org
Betreff: Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

But what is the use-case for this? Why store a full-text index
implementation that is totally unrelated to the DB inside a database layer
that just makes it perform worse, use more disk-space, etc.? It's like
implementing a database index by storing it in another database...

Regards,
Alex

On 29.11.10 09:37, "Ard Schrijvers" <a....@onehippo.com> wrote:

>On Mon, Nov 29, 2010 at 9:21 AM, Thomas Mueller <mu...@adobe.com> wrote:
>> Hi,
>>
>> Jackrabbit currently uses Lucene. According to the Lucene FAQ it should
>>be possible, but I'm not sure what changes (if any) would be required in
>>Jackrabbit:
>
>Most likely only a database based LuceneDirectory would have to be
>created. Although the FAQ states that it is possible, I don't think it
>can ever perform. Once again, I believe in storing segments as a whole
>in a database to have a persisted lucene index, but, you cannot
>efficiently access these segments in a database. You then need to keep
>the index in memory.
>
>>
>> 
>>http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index
>>_in_a_relational_database.3F
>>
>> I'm not aware of plans to support this feature. Patches are welcome of
>>course :-)
>
>First of all, we would have to do some housekeeping in the existing
>implementation. The entire 'multi-index' setup, meant for 'near real
>time searches' is unneeded anymore against newer lucene versions. Also
>the way properties are indexed should be replaced. Note that these
>changes will most likely already affect about 100 of the 200 query
>classes (where quite a lot can be removed as well). So, this is lots
>of work. After this, preferably, we can move to the 4.x Lucene
>versions. So, a patch is welcome, but don't think to light about it :)
>
>Regards Ard
>
>>
>> Regards,
>> Thomas
>>
>> From: Atul Kumar Tripathi
>><At...@virtusa.com>>
>> Reply-To: 
>>"users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>"
>><us...@jackrabbit.apache.org>>
>> Date: Mon, 29 Nov 2010 05:06:31 +0000
>> To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>"
>><us...@jackrabbit.apache.org>>
>> Subject: Functionality to store indexes in database with jackrabbit
>>2.1.2 or upcoming releases.........
>>
>>
>> Hi Guys,
>>
>> Is this possible to store indexes in database using jackrabbit 2.1.2?
>>
>> Are there any plans to provide functionality to store indexes in
>>database with upcoming Jackrabbit Releases?
>>
>> Thanks in advance.
>>
>> Thanks & Regards.
>> Atul Tripathi
>> Senior Engineer  |  Datacert/Technology.
>> Virtusa India Pvt. Ltd.
>> Chennai ATC
>> Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
>> [cid:image001.jpg@01CB8FB1.496525A0]<http://www.virtusa.com/>
>>[cid:image002.gif@01CB8FB1.496525A0] <http://www.virtusa.com/blog/>
>>[cid:image003.gif@01CB8FB1.496525A0] <https://twitter.com/VirtusaCorp>
>>[cid:image004.gif@01CB8FB1.496525A0]
>><http://www.linkedin.com/companies/virtusa>
>>[cid:image005.gif@01CB8FB1.496525A0]
>><http://www.facebook.com/VirtusaCorp>
>>[cid:image006.gif@01CB8FB1.496525A0]
>><http://www.youtube.com/virtusacorporation>
>>[cid:image007.gif@01CB8FB1.496525A0]
>><http://www.flickr.com/photos/virtusa>
>>
>>
>> Virtusa was recently ranked and featured in 2010 Deloitte Technology
>>Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100
>>sub-list and 2010 FinTech 100 among others.
>>
>> 
>>-------------------------------------------------------------------------
>>--------------------
>>
>> This message, including any attachments, contains confidential
>>information intended for a specific individual and purpose, and is
>>intended for the addressee only. Any unauthorized disclosure, use,
>>dissemination, copying, or distribution of this message or any of its
>>attachments or the information contained in this e-mail, or the taking
>>of any action based on it, is strictly prohibited. If you are not the
>>intended recipient, please notify the sender immediately by return
>>e-mail and delete this message.
>>
>> 
>>-------------------------------------------------------------------------
>>--------------------
>>
>>
>
>
>
>-- 
>Hippo
>Europe  €  Amsterdam  Oosteinde 11  €  1017 WT Amsterdam  €  +31 (0)20
>522 4466
>USA  € San Francisco  185 H Street Suite B  €  Petaluma CA 94952-5100
>€  +1 (707) 773 4646
>Canada    €   Montréal  5369 Boulevard St-Laurent  €  Montréal QC H2T
>1S5  €  +1 (514) 316 8966
>www.onehippo.com  €  www.onehippo.org  €  info@onehippo.com
>



Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Alexander Klimetschek <ak...@adobe.com>.

But what is the use-case for this? Why store a full-text index
implementation that is totally unrelated to the DB inside a database layer
that just makes it perform worse, use more disk-space, etc.? It's like
implementing a database index by storing it in another database...

Regards,
Alex

On 29.11.10 09:37, "Ard Schrijvers" <a....@onehippo.com> wrote:

>On Mon, Nov 29, 2010 at 9:21 AM, Thomas Mueller <mu...@adobe.com> wrote:
>> Hi,
>>
>> Jackrabbit currently uses Lucene. According to the Lucene FAQ it should
>>be possible, but I'm not sure what changes (if any) would be required in
>>Jackrabbit:
>
>Most likely only a database based LuceneDirectory would have to be
>created. Although the FAQ states that it is possible, I don't think it
>can ever perform. Once again, I believe in storing segments as a whole
>in a database to have a persisted lucene index, but, you cannot
>efficiently access these segments in a database. You then need to keep
>the index in memory.
>
>>
>> 
>>http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index
>>_in_a_relational_database.3F
>>
>> I'm not aware of plans to support this feature. Patches are welcome of
>>course :-)
>
>First of all, we would have to do some housekeeping in the existing
>implementation. The entire 'multi-index' setup, meant for 'near real
>time searches' is unneeded anymore against newer lucene versions. Also
>the way properties are indexed should be replaced. Note that these
>changes will most likely already affect about 100 of the 200 query
>classes (where quite a lot can be removed as well). So, this is lots
>of work. After this, preferably, we can move to the 4.x Lucene
>versions. So, a patch is welcome, but don't think to light about it :)
>
>Regards Ard
>
>>
>> Regards,
>> Thomas
>>
>> From: Atul Kumar Tripathi
>><At...@virtusa.com>>
>> Reply-To: 
>>"users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>"
>><us...@jackrabbit.apache.org>>
>> Date: Mon, 29 Nov 2010 05:06:31 +0000
>> To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>"
>><us...@jackrabbit.apache.org>>
>> Subject: Functionality to store indexes in database with jackrabbit
>>2.1.2 or upcoming releases.........
>>
>>
>> Hi Guys,
>>
>> Is this possible to store indexes in database using jackrabbit 2.1.2?
>>
>> Are there any plans to provide functionality to store indexes in
>>database with upcoming Jackrabbit Releases?
>>
>> Thanks in advance.
>>
>> Thanks & Regards.
>> Atul Tripathi
>> Senior Engineer  |  Datacert/Technology.
>> Virtusa India Pvt. Ltd.
>> Chennai ATC
>> Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
>> [cid:image001.jpg@01CB8FB1.496525A0]<http://www.virtusa.com/>
>>[cid:image002.gif@01CB8FB1.496525A0] <http://www.virtusa.com/blog/>
>>[cid:image003.gif@01CB8FB1.496525A0] <https://twitter.com/VirtusaCorp>
>>[cid:image004.gif@01CB8FB1.496525A0]
>><http://www.linkedin.com/companies/virtusa>
>>[cid:image005.gif@01CB8FB1.496525A0]
>><http://www.facebook.com/VirtusaCorp>
>>[cid:image006.gif@01CB8FB1.496525A0]
>><http://www.youtube.com/virtusacorporation>
>>[cid:image007.gif@01CB8FB1.496525A0]
>><http://www.flickr.com/photos/virtusa>
>>
>>
>> Virtusa was recently ranked and featured in 2010 Deloitte Technology
>>Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100
>>sub-list and 2010 FinTech 100 among others.
>>
>> 
>>-------------------------------------------------------------------------
>>--------------------
>>
>> This message, including any attachments, contains confidential
>>information intended for a specific individual and purpose, and is
>>intended for the addressee only. Any unauthorized disclosure, use,
>>dissemination, copying, or distribution of this message or any of its
>>attachments or the information contained in this e-mail, or the taking
>>of any action based on it, is strictly prohibited. If you are not the
>>intended recipient, please notify the sender immediately by return
>>e-mail and delete this message.
>>
>> 
>>-------------------------------------------------------------------------
>>--------------------
>>
>>
>
>
>
>-- 
>Hippo
>Europe  €  Amsterdam  Oosteinde 11  €  1017 WT Amsterdam  €  +31 (0)20
>522 4466
>USA  € San Francisco  185 H Street Suite B  €  Petaluma CA 94952-5100
>€  +1 (707) 773 4646
>Canada    €   Montréal  5369 Boulevard St-Laurent  €  Montréal QC H2T
>1S5  €  +1 (514) 316 8966
>www.onehippo.com  €  www.onehippo.org  €  info@onehippo.com
>



Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Ard Schrijvers <a....@onehippo.com>.

On Mon, Nov 29, 2010 at 9:21 AM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
> Jackrabbit currently uses Lucene. According to the Lucene FAQ it should be possible, but I'm not sure what changes (if any) would be required in Jackrabbit:

Most likely only a database based LuceneDirectory would have to be
created. Although the FAQ states that it is possible, I don't think it
can ever perform. Once again, I believe in storing segments as a whole
in a database to have a persisted lucene index, but, you cannot
efficiently access these segments in a database. You then need to keep
the index in memory.

>
> http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index_in_a_relational_database.3F
>
> I'm not aware of plans to support this feature. Patches are welcome of course :-)

First of all, we would have to do some housekeeping in the existing
implementation. The entire 'multi-index' setup, meant for 'near real
time searches' is unneeded anymore against newer lucene versions. Also
the way properties are indexed should be replaced. Note that these
changes will most likely already affect about 100 of the 200 query
classes (where quite a lot can be removed as well). So, this is lots
of work. After this, preferably, we can move to the 4.x Lucene
versions. So, a patch is welcome, but don't think to light about it :)

Regards Ard

>
> Regards,
> Thomas
>
> From: Atul Kumar Tripathi <At...@virtusa.com>>
> Reply-To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>" <us...@jackrabbit.apache.org>>
> Date: Mon, 29 Nov 2010 05:06:31 +0000
> To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>" <us...@jackrabbit.apache.org>>
> Subject: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........
>
>
> Hi Guys,
>
> Is this possible to store indexes in database using jackrabbit 2.1.2?
>
> Are there any plans to provide functionality to store indexes in database with upcoming Jackrabbit Releases?
>
> Thanks in advance.
>
> Thanks & Regards.
> Atul Tripathi
> Senior Engineer  |  Datacert/Technology.
> Virtusa India Pvt. Ltd.
> Chennai ATC
> Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
> [cid:image001.jpg@01CB8FB1.496525A0]<http://www.virtusa.com/> [cid:image002.gif@01CB8FB1.496525A0] <http://www.virtusa.com/blog/> [cid:image003.gif@01CB8FB1.496525A0] <https://twitter.com/VirtusaCorp>  [cid:image004.gif@01CB8FB1.496525A0] <http://www.linkedin.com/companies/virtusa>  [cid:image005.gif@01CB8FB1.496525A0] <http://www.facebook.com/VirtusaCorp>  [cid:image006.gif@01CB8FB1.496525A0] <http://www.youtube.com/virtusacorporation>  [cid:image007.gif@01CB8FB1.496525A0] <http://www.flickr.com/photos/virtusa>
>
>
> Virtusa was recently ranked and featured in 2010 Deloitte Technology Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100 sub-list and 2010 FinTech 100 among others.
>
> ---------------------------------------------------------------------------------------------
>
> This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is intended for the addressee only. Any unauthorized disclosure, use, dissemination, copying, or distribution of this message or any of its attachments or the information contained in this e-mail, or the taking of any action based on it, is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail and delete this message.
>
> ---------------------------------------------------------------------------------------------
>
>



-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

Jackrabbit currently uses Lucene. According to the Lucene FAQ it should be possible, but I'm not sure what changes (if any) would be required in Jackrabbit:

http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index_in_a_relational_database.3F

I'm not aware of plans to support this feature. Patches are welcome of course :-)

Regards,
Thomas

From: Atul Kumar Tripathi <At...@virtusa.com>>
Reply-To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>" <us...@jackrabbit.apache.org>>
Date: Mon, 29 Nov 2010 05:06:31 +0000
To: "users@jackrabbit.apache.org<ma...@jackrabbit.apache.org>" <us...@jackrabbit.apache.org>>
Subject: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........


Hi Guys,

Is this possible to store indexes in database using jackrabbit 2.1.2?

Are there any plans to provide functionality to store indexes in database with upcoming Jackrabbit Releases?

Thanks in advance.

Thanks & Regards.
Atul Tripathi
Senior Engineer  |  Datacert/Technology.
Virtusa India Pvt. Ltd.
Chennai ATC
Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
[cid:image001.jpg@01CB8FB1.496525A0]<http://www.virtusa.com/> [cid:image002.gif@01CB8FB1.496525A0] <http://www.virtusa.com/blog/> [cid:image003.gif@01CB8FB1.496525A0] <https://twitter.com/VirtusaCorp>  [cid:image004.gif@01CB8FB1.496525A0] <http://www.linkedin.com/companies/virtusa>  [cid:image005.gif@01CB8FB1.496525A0] <http://www.facebook.com/VirtusaCorp>  [cid:image006.gif@01CB8FB1.496525A0] <http://www.youtube.com/virtusacorporation>  [cid:image007.gif@01CB8FB1.496525A0] <http://www.flickr.com/photos/virtusa>


Virtusa was recently ranked and featured in 2010 Deloitte Technology Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100 sub-list and 2010 FinTech 100 among others.

---------------------------------------------------------------------------------------------

This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is intended for the addressee only. Any unauthorized disclosure, use, dissemination, copying, or distribution of this message or any of its attachments or the information contained in this e-mail, or the taking of any action based on it, is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail and delete this message.

---------------------------------------------------------------------------------------------