You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Dolan, Kelly" <kd...@inmedius.com> on 2011/05/03 22:02:32 UTC

is doc addition / indexing synchronous or asynchronous?

(re-posting since it didn't seem like my original email was sent out, my
apologies if I'm mistaken)

 

 

i found a thread from Apr 2006
(http://jackrabbit.510166.n4.nabble.com/Is-doc-addition-indexing-synchro
nous-or-asynchronous-td528243.html).  

 

i find myself in a similar situation - for me, i'm adding lots of
documents to the repository at once, its taking a great deal of time,
the majority of that time is spent indexing and therefore i need to
change my configuration or extend SearchIndex such that indexing occurs
asynchronously ... i really do not have a choice.

 

i followed most of the thread conversation but not sure if i totally
understand everything.  

 

(1) the thread mentions the observation events are synchronous.  it is
possible to change this to be asynchronous?

(2) marcel brought up two issues with (1)

    (a) a search may not "hit" a document just added; there would be a
delay

    (b) if the jvm crashed, documents not indexed yet could not be and
this cannot be recovered

 

i can live with (a) but not (b). the thread continued on re: (b) wrt
persisting what needs indexed.  that is where i started to get lost.
while (b) was mentioned, it seemed like jackrabbit handles it with a
redo.log.

 

in any case, i need to make indexing asynchronous.  i had started down
the path of extending SearchIndex and overridding the updateNodes()
method but now i'm wondering if there is just a way i can configure
jackrabbit to make indexing asynchronous or if there are still serious
issues i have not considered. Or is extending SearchIndex and
overridding the updateNodes() method what I should do?

 

i'm currently integrated with jackrabbit 1.6.  i'm not sure if i can
upgrade to the latest version at this time but if a later version buys
me something, please let me know.

 

kelly

 


Re: is doc addition / indexing synchronous or asynchronous?

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 04.05.11 22:38, "Dolan, Kelly" <kd...@inmedius.com> wrote:
>If I modify SearchManager such that it
>implements EventListener as opposed to SynchronousEventListener indexing
>now occurs
>in a background thread.  If I proceed with such a change, will this break
>anything in Jackrabbit?  i.e., is there any operation that modifies the
>repository, immediately does a search and expects to find what was just
>added
>and if it does not, fails?

I don't know that exact part of the search index implementation to judge
what this change will do, but I think this would break 6.5 "Search Scope"
from the JCR spec [0]:

"A query must search the persistent workspace associated with the current
session. It may take into account pending changes to the persistent
workspace; that is, changes which are either unsaved or, within a
transaction, saved but uncommitted."


That means, as soon save() returns (== persisted), the index should be up
to date. This might be important for certain applications that for example
change something, save it and then run a search again to update the view -
this scenario would be broken and the app would have no information how
long to wait until the search index is up to date.

[0] http://www.day.com/specs/jcr/2.0/6_Query.html#6.5%20Search%20Scope

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel


RE: is doc addition / indexing synchronous or asynchronous?

Posted by "Dolan, Kelly" <kd...@inmedius.com>.
 

If I modify SearchManager such that it implements EventListener as
opposed to SynchronousEventListener indexing now occurs in a background
thread.  If I proceed with such a change, will this break anything in
Jackrabbit?  i.e., is there any operation that modifies the repository,
immediately does a search and expects to find what was just added and if
it does not, fails?

 

Kelly

 

________________________________

From: Dolan, Kelly [mailto:kdolan@inmedius.com] 
Sent: Tuesday, May 03, 2011 4:03 PM
To: dev@jackrabbit.apache.org
Subject: is doc addition / indexing synchronous or asynchronous? 

 

(re-posting since it didn't seem like my original email was sent out, my
apologies if I'm mistaken)

 

 

i found a thread from Apr 2006
(http://jackrabbit.510166.n4.nabble.com/Is-doc-addition-indexing-synchro
nous-or-asynchronous-td528243.html).  

 

i find myself in a similar situation - for me, i'm adding lots of
documents to the repository at once, its taking a great deal of time,
the majority of that time is spent indexing and therefore i need to
change my configuration or extend SearchIndex such that indexing occurs
asynchronously ... i really do not have a choice.

 

i followed most of the thread conversation but not sure if i totally
understand everything.  

 

(1) the thread mentions the observation events are synchronous.  it is
possible to change this to be asynchronous?

(2) marcel brought up two issues with (1)

    (a) a search may not "hit" a document just added; there would be a
delay

    (b) if the jvm crashed, documents not indexed yet could not be and
this cannot be recovered

 

i can live with (a) but not (b). the thread continued on re: (b) wrt
persisting what needs indexed.  that is where i started to get lost.
while (b) was mentioned, it seemed like jackrabbit handles it with a
redo.log.

 

in any case, i need to make indexing asynchronous.  i had started down
the path of extending SearchIndex and overridding the updateNodes()
method but now i'm wondering if there is just a way i can configure
jackrabbit to make indexing asynchronous or if there are still serious
issues i have not considered. Or is extending SearchIndex and
overridding the updateNodes() method what I should do?

 

i'm currently integrated with jackrabbit 1.6.  i'm not sure if i can
upgrade to the latest version at this time but if a later version buys
me something, please let me know.

 

kelly