You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Manokaran K <ma...@gmail.com> on 2010/04/14 11:15:47 UTC

couchdb-lucene reindexes when restarted

Hi,

Hi, I'm using couchdb-lucene 0.5 with couchdb 0.10. It looks like whenever I
restart couchdb or couchdb-lucene, lucene re-indexes the documents once
again for that particular index function!! Is that expected behavior?

At IRC someone said its expected because docs might have changed! I thought
couchdb-lucene would use etags or _changes api to test if the docs need
reindexing!

Which is right?

thanks,
mano

-- 
Lord, give us the wisdom to utter words that are gentle and tender, for
tomorrow we may have to eat them.
   -Sen. Morris Udall

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
Hi,

0.5 does indeed use _changes to incrementally update the Lucene
indexes; it should *not* be starting over unless you delete the index
or change the index functions. 0.5 is under active development so I'm
very keen to hear about this bug. I'm attempting to reproduce it
locally now.

B.

On Wed, Apr 14, 2010 at 11:45 AM, Manokaran K <ma...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 3:58 PM, Sebastian Cohnen <
> sebastiancohnen@googlemail.com> wrote:
>
>> Hi,
>>
>> this is (of course) not to be expected. c-l uses the _chages API and
>> checkpointing (using the update sequence number) to figure out, if there is
>> something new to index - at least as far as I understood this. But since 0.5
>> is not yet released, maybe you've found a bug :) Could you provide some more
>> information on your setup, used design document, c-l logs?
>>
>>
>>
> My setup consists of about 150K docs to demo a web app. I'll prune it to a
> much smaller number and give you the information. Any specific action you
> would want me to try?
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Thu, Apr 15, 2010 at 10:25 PM, Robert Newson <ro...@gmail.com>wrote:

> can you confirm that you altered some of these lines before you sent
> them? The existence of the string "<dbname>" is confusing me.
>

I should've mentioned it!! Yes, changed the db name from the actual string.

Its late here (India). I'll recreate all the docs from scratch and index
them and update you tomorrow.

Thanks a lot,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
This is now fixed on the master branch.

I force a document addition if there wasn't one since the last commit.
You'll see it in doc_count for index functions that don't index
anything.

B.

On Fri, Apr 16, 2010 at 12:10 PM, Robert Newson <ro...@gmail.com> wrote:
> Yes, that would be better. 0.4 used to add a dummy document in all
> cases (to track update_seq) so this didn't use to happen. with Lucene
> 2.9/3.0, I'm using commit(userData) instead of a dummy document.
> Unfortunately commit() does nothing if there are no documents added.
>
> I'll have a fix for this today; I'll just add an empty document if
> there was no other change.
>
> B.
>
> On Fri, Apr 16, 2010 at 10:20 AM, Manokaran K <ma...@gmail.com> wrote:
>> On Fri, Apr 16, 2010 at 1:51 PM, Robert Newson <ro...@gmail.com>wrote:
>>
>>> That's more interesting. IIRC, Lucene's commit() method will only
>>> write to disk if there have been document changes. So, if your
>>> function doesn't update anything at all (your function returns null
>>> for all documents, say) then the update_seq won't be updated, and
>>> hence it will start over each time.
>>>
>>
>> I think you have got it!
>>
>> I tested the above by creating another doc (after all the others have been
>> generated by the ruby script) that will ensure that at least one _fti index
>> function will return a value. Now, its working as expected - the update_seq
>> got bumped to the latest and survives restarts!!
>>
>> But am curious: would it not be better if lucene bumped up the seq number
>> every time it indexed whether or not the index functions returned a value!
>>
>> thanks a ton for your efforts.
>>
>> regds,
>> mano
>>
>

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
Yes, that would be better. 0.4 used to add a dummy document in all
cases (to track update_seq) so this didn't use to happen. with Lucene
2.9/3.0, I'm using commit(userData) instead of a dummy document.
Unfortunately commit() does nothing if there are no documents added.

I'll have a fix for this today; I'll just add an empty document if
there was no other change.

B.

On Fri, Apr 16, 2010 at 10:20 AM, Manokaran K <ma...@gmail.com> wrote:
> On Fri, Apr 16, 2010 at 1:51 PM, Robert Newson <ro...@gmail.com>wrote:
>
>> That's more interesting. IIRC, Lucene's commit() method will only
>> write to disk if there have been document changes. So, if your
>> function doesn't update anything at all (your function returns null
>> for all documents, say) then the update_seq won't be updated, and
>> hence it will start over each time.
>>
>
> I think you have got it!
>
> I tested the above by creating another doc (after all the others have been
> generated by the ruby script) that will ensure that at least one _fti index
> function will return a value. Now, its working as expected - the update_seq
> got bumped to the latest and survives restarts!!
>
> But am curious: would it not be better if lucene bumped up the seq number
> every time it indexed whether or not the index functions returned a value!
>
> thanks a ton for your efforts.
>
> regds,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Fri, Apr 16, 2010 at 1:51 PM, Robert Newson <ro...@gmail.com>wrote:

> That's more interesting. IIRC, Lucene's commit() method will only
> write to disk if there have been document changes. So, if your
> function doesn't update anything at all (your function returns null
> for all documents, say) then the update_seq won't be updated, and
> hence it will start over each time.
>

I think you have got it!

I tested the above by creating another doc (after all the others have been
generated by the ruby script) that will ensure that at least one _fti index
function will return a value. Now, its working as expected - the update_seq
got bumped to the latest and survives restarts!!

But am curious: would it not be better if lucene bumped up the seq number
every time it indexed whether or not the index functions returned a value!

thanks a ton for your efforts.

regds,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
That's more interesting. IIRC, Lucene's commit() method will only
write to disk if there have been document changes. So, if your
function doesn't update anything at all (your function returns null
for all documents, say) then the update_seq won't be updated, and
hence it will start over each time.

B.

On Fri, Apr 16, 2010 at 9:05 AM, Manokaran K <ma...@gmail.com> wrote:
> On Thu, Apr 15, 2010 at 10:30 PM, Robert Newson <ro...@gmail.com>wrote:
>
>> I can't reproduce this. My setup always picks up where I left off, so
>> there must be some step I'm not doing to trigger this.
>>
>> Can you delete the target/indexes and reproduce this from scratch? If
>> so, could you list all the steps?
>>
>
> I get this problem only when I load couchdb with demo data for my
> application - a set of school related documents all generated with random
> data using a ruby script. When I tried with another ruby script that
> generated a bland set of docs, the problem vanished. So, it has to do with
> the docs I generate.
>
> But there are no errors in couchdb logs when I load the data! Only the
> update_seq (in c-l logs) does not get bumped up to the highest number.
> Instead it gets stuck at a lower number. Is there some way I can query
> couchdb to get the doc that resulted in this update_seq? Perhaps there will
> be some clue there!!
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Thu, Apr 15, 2010 at 10:30 PM, Robert Newson <ro...@gmail.com>wrote:

> I can't reproduce this. My setup always picks up where I left off, so
> there must be some step I'm not doing to trigger this.
>
> Can you delete the target/indexes and reproduce this from scratch? If
> so, could you list all the steps?
>

I get this problem only when I load couchdb with demo data for my
application - a set of school related documents all generated with random
data using a ruby script. When I tried with another ruby script that
generated a bland set of docs, the problem vanished. So, it has to do with
the docs I generate.

But there are no errors in couchdb logs when I load the data! Only the
update_seq (in c-l logs) does not get bumped up to the highest number.
Instead it gets stuck at a lower number. Is there some way I can query
couchdb to get the doc that resulted in this update_seq? Perhaps there will
be some clue there!!

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
I can't reproduce this. My setup always picks up where I left off, so
there must be some step I'm not doing to trigger this.

Can you delete the target/indexes and reproduce this from scratch? If
so, could you list all the steps?

B.

On Thu, Apr 15, 2010 at 5:55 PM, Robert Newson <ro...@gmail.com> wrote:
> can you confirm that you altered some of these lines before you sent
> them? The existence of the string "<dbname>" is confusing me.
>
> B.
>
> On Thu, Apr 15, 2010 at 5:00 PM, Manokaran K <ma...@gmail.com> wrote:
>> On Thu, Apr 15, 2010 at 9:25 PM, Manokaran K <ma...@gmail.com> wrote:
>>
>>>
>>> On Thu, Apr 15, 2010 at 8:08 PM, Robert Newson <ro...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> You need to modify log4j.xml and change the word INFO to DEBUG and
>>>> then restart couchdb. Please send all the output that it gives.
>>>>
>>>>
>>> Its here: http://pastie.org/921404
>>>
>>> regds,
>>> mano
>>>
>>
>> I did one more restart and the following are the output that seem different
>> from the last one - the bumped since to in the first 3 lines is different
>> from the last attempt:
>>
>> 2010-04-15 21:28:04,544 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/ezw0w08o48vo0itbmbrpim10f
>> bumped since to 34774
>> 2010-04-15 21:28:04,544 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/ezw0w08o48vo0itbmbrpim10f
>> bumped since to 34774
>> 2010-04-15 21:28:04,779 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/bafqygi7kb41yj0wq5swblbx5
>> bumped since to 34774
>> 2010-04-15 21:28:04,801 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/2abpocyi93809sllolxtczrc8
>> bumped since to 8142
>> 2010-04-15 21:28:04,812 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/5krdzy83b9nlrsl5pnru6eh3s
>> bumped since to 8142
>> 2010-04-15 21:28:04,827 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/acv33jkyzefc7wb8djdc0cyw5
>> bumped since to 8142
>> 2010-04-15 21:28:04,843 DEBUG [spulz]
>> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/1rsai42qit1p4qvrk7crky66e
>> bumped since to 8142
>>
>>
>>
>> --
>> Lord, give us the wisdom to utter words that are gentle and tender, for
>> tomorrow we may have to eat them.
>>   -Sen. Morris Udall
>>
>

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
can you confirm that you altered some of these lines before you sent
them? The existence of the string "<dbname>" is confusing me.

B.

On Thu, Apr 15, 2010 at 5:00 PM, Manokaran K <ma...@gmail.com> wrote:
> On Thu, Apr 15, 2010 at 9:25 PM, Manokaran K <ma...@gmail.com> wrote:
>
>>
>> On Thu, Apr 15, 2010 at 8:08 PM, Robert Newson <ro...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> You need to modify log4j.xml and change the word INFO to DEBUG and
>>> then restart couchdb. Please send all the output that it gives.
>>>
>>>
>> Its here: http://pastie.org/921404
>>
>> regds,
>> mano
>>
>
> I did one more restart and the following are the output that seem different
> from the last one - the bumped since to in the first 3 lines is different
> from the last attempt:
>
> 2010-04-15 21:28:04,544 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/ezw0w08o48vo0itbmbrpim10f
> bumped since to 34774
> 2010-04-15 21:28:04,544 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/ezw0w08o48vo0itbmbrpim10f
> bumped since to 34774
> 2010-04-15 21:28:04,779 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/bafqygi7kb41yj0wq5swblbx5
> bumped since to 34774
> 2010-04-15 21:28:04,801 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/2abpocyi93809sllolxtczrc8
> bumped since to 8142
> 2010-04-15 21:28:04,812 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/5krdzy83b9nlrsl5pnru6eh3s
> bumped since to 8142
> 2010-04-15 21:28:04,827 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/acv33jkyzefc7wb8djdc0cyw5
> bumped since to 8142
> 2010-04-15 21:28:04,843 DEBUG [spulz]
> org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/1rsai42qit1p4qvrk7crky66e
> bumped since to 8142
>
>
>
> --
> Lord, give us the wisdom to utter words that are gentle and tender, for
> tomorrow we may have to eat them.
>   -Sen. Morris Udall
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Thu, Apr 15, 2010 at 9:25 PM, Manokaran K <ma...@gmail.com> wrote:

>
> On Thu, Apr 15, 2010 at 8:08 PM, Robert Newson <ro...@gmail.com>wrote:
>
>> Hi,
>>
>> You need to modify log4j.xml and change the word INFO to DEBUG and
>> then restart couchdb. Please send all the output that it gives.
>>
>>
> Its here: http://pastie.org/921404
>
> regds,
> mano
>

I did one more restart and the following are the output that seem different
from the last one - the bumped since to in the first 3 lines is different
from the last attempt:

2010-04-15 21:28:04,544 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/ezw0w08o48vo0itbmbrpim10f
bumped since to 34774
2010-04-15 21:28:04,544 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/ezw0w08o48vo0itbmbrpim10f
bumped since to 34774
2010-04-15 21:28:04,779 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/bafqygi7kb41yj0wq5swblbx5
bumped since to 34774
2010-04-15 21:28:04,801 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/2abpocyi93809sllolxtczrc8
bumped since to 8142
2010-04-15 21:28:04,812 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/5krdzy83b9nlrsl5pnru6eh3s
bumped since to 8142
2010-04-15 21:28:04,827 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/acv33jkyzefc7wb8djdc0cyw5
bumped since to 8142
2010-04-15 21:28:04,843 DEBUG [spulz]
org.apache.lucene.store.NIOFSDirectory@/home/mano/couchdb-lucene-0.5-SNAPSHOT-debug/indexes/4d5ac741-4566-4380-9628-0a93ab246ea5/1rsai42qit1p4qvrk7crky66e
bumped since to 8142



-- 
Lord, give us the wisdom to utter words that are gentle and tender, for
tomorrow we may have to eat them.
   -Sen. Morris Udall

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Thu, Apr 15, 2010 at 8:08 PM, Robert Newson <ro...@gmail.com>wrote:

> Hi,
>
> You need to modify log4j.xml and change the word INFO to DEBUG and
> then restart couchdb. Please send all the output that it gives.
>
>
Its here: http://pastie.org/921404

regds,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
Hi,

You need to modify log4j.xml and change the word INFO to DEBUG and
then restart couchdb. Please send all the output that it gives.

B.

On Wed, Apr 14, 2010 at 1:58 PM, Manokaran K <ma...@gmail.com> wrote:
> I tried with the latest src. This time its starts from update_seq 7621.
>
> There's a couchdb-lucene.log file under logs dir. But its content seem to be
> similar to what' was output the last time. Contents of which file you want
> me to send?
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
I tried with the latest src. This time its starts from update_seq 7621.

There's a couchdb-lucene.log file under logs dir. But its content seem to be
similar to what' was output the last time. Contents of which file you want
me to send?

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
I'd like the debug level output but I'm no longer so sure I understand
the cause.

the update_seq is calculated only from reachable indexes (I read all
views in all ddocs). so it'll be a bug in that rather than what I said
earlier, I think.

Seeing the debug line where 'since' drops to the lower number will
tell me a lot.

B.

On Wed, Apr 14, 2010 at 1:29 PM, Manokaran K <ma...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 5:49 PM, Robert Newson <ro...@gmail.com>wrote:
>
>> Ok, I think I understand this now.
>>
>> When you start couchdb-lucene on a database for the first time (and
>> after a restart), it looks at the update_seq of all the Lucene indexes
>> it has on disk and takes the lowest number of these. It then uses that
>> in a call to _changes?since=N.
>>
>> My suspicion is you have an index that is no longer reachable (because
>> you've changed your index function at some point). This index won't be
>> updated, so it's stuck at 27200.
>>
>> I've pushed an update that will log (at DEBUG level, so change
>> log4j.xml temporarily) how the 'since' value is calculated. It would
>> be very helpful if you could verify my hypothesis.
>>
>> You can fix this, if I'm right, by running _cleanup (check the README
>> for the syntax) which will delete the unreachable index. I need to
>> make a real fix, though; namely, the update_seq calculation should
>> ignore unreachable indexes.
>>
>>
> Your hypothesis is likely. A couple of days back, I was getting some doc ids
> as response to some query but the doc was not available in couchdb. Most
> likely c-l returned some doc ids that it had indexed earlier but since I
> drop and recreate the db in couchdb during development, c-l did not know
> about it! But I was not able to recreate this problem.
>
> Will use the latest build and update you.
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 5:49 PM, Robert Newson <ro...@gmail.com>wrote:

> Ok, I think I understand this now.
>
> When you start couchdb-lucene on a database for the first time (and
> after a restart), it looks at the update_seq of all the Lucene indexes
> it has on disk and takes the lowest number of these. It then uses that
> in a call to _changes?since=N.
>
> My suspicion is you have an index that is no longer reachable (because
> you've changed your index function at some point). This index won't be
> updated, so it's stuck at 27200.
>
> I've pushed an update that will log (at DEBUG level, so change
> log4j.xml temporarily) how the 'since' value is calculated. It would
> be very helpful if you could verify my hypothesis.
>
> You can fix this, if I'm right, by running _cleanup (check the README
> for the syntax) which will delete the unreachable index. I need to
> make a real fix, though; namely, the update_seq calculation should
> ignore unreachable indexes.
>
>
Your hypothesis is likely. A couple of days back, I was getting some doc ids
as response to some query but the doc was not available in couchdb. Most
likely c-l returned some doc ids that it had indexed earlier but since I
drop and recreate the db in couchdb during development, c-l did not know
about it! But I was not able to recreate this problem.

Will use the latest build and update you.

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
Ok, I think I understand this now.

When you start couchdb-lucene on a database for the first time (and
after a restart), it looks at the update_seq of all the Lucene indexes
it has on disk and takes the lowest number of these. It then uses that
in a call to _changes?since=N.

My suspicion is you have an index that is no longer reachable (because
you've changed your index function at some point). This index won't be
updated, so it's stuck at 27200.

I've pushed an update that will log (at DEBUG level, so change
log4j.xml temporarily) how the 'since' value is calculated. It would
be very helpful if you could verify my hypothesis.

You can fix this, if I'm right, by running _cleanup (check the README
for the syntax) which will delete the unreachable index. I need to
make a real fix, though; namely, the update_seq calculation should
ignore unreachable indexes.

B.

On Wed, Apr 14, 2010 at 1:07 PM, Manokaran K <ma...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 5:34 PM, Manokaran K <ma...@gmail.com> wrote:
>
>> Again. This time I restarted couchdb leaving c-l running:
>>
>> 2010-04-14 17:32:32,193 INFO [spulz] View[digest=acv33jkyzefc7wb8djdc0cyw5]
>> now at update_seq 202639
>> 2010-04-14 17:32:32,323 INFO [spulz] View[digest=5krdzy83b9nlrsl5pnru6eh3s]
>> now at update_seq 202639
>> 2010-04-14 17:32:32,338 INFO [spulz] View[digest=bafqygi7kb41yj0wq5swblbx5]
>> now at update_seq 202639
>> 2010-04-14 17:33:35,971 INFO [spulz] Indexing from update_seq 27200
>>
>>
> I do make one change to the db between the restarts, a new 'session' doc is
> added to the db. But its not indexed for fti or views. I hope that's not a
> problem.
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 5:34 PM, Manokaran K <ma...@gmail.com> wrote:

> Again. This time I restarted couchdb leaving c-l running:
>
> 2010-04-14 17:32:32,193 INFO [spulz] View[digest=acv33jkyzefc7wb8djdc0cyw5]
> now at update_seq 202639
> 2010-04-14 17:32:32,323 INFO [spulz] View[digest=5krdzy83b9nlrsl5pnru6eh3s]
> now at update_seq 202639
> 2010-04-14 17:32:32,338 INFO [spulz] View[digest=bafqygi7kb41yj0wq5swblbx5]
> now at update_seq 202639
> 2010-04-14 17:33:35,971 INFO [spulz] Indexing from update_seq 27200
>
>
I do make one change to the db between the restarts, a new 'session' doc is
added to the db. But its not indexed for fti or views. I hope that's not a
problem.

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
Again. This time I restarted couchdb leaving c-l running:

2010-04-14 17:32:32,193 INFO [spulz] View[digest=acv33jkyzefc7wb8djdc0cyw5]
now at update_seq 202639
2010-04-14 17:32:32,323 INFO [spulz] View[digest=5krdzy83b9nlrsl5pnru6eh3s]
now at update_seq 202639
2010-04-14 17:32:32,338 INFO [spulz] View[digest=bafqygi7kb41yj0wq5swblbx5]
now at update_seq 202639
2010-04-14 17:33:35,971 INFO [spulz] Indexing from update_seq 27200

regds,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 5:03 PM, Manokaran K <ma...@gmail.com> wrote:

>
> On Wed, Apr 14, 2010 at 4:56 PM, Robert Newson <ro...@gmail.com>wrote:
>
>> One thought. There is a delay between indexing and commit the index to
>> disk. updates within that period (currently 60s seconds) will indeed
>> be redone if you restart couchdb-lucene. I'll make the setting
>> configurable but it will always be at least a few seconds, as this
>> makes for much more efficient i/o. You can see search results before
>> the data is committed to disk, so perhaps this explains what you're
>> seeing?
>>
>>
> Its not just a few documents that gets indexed. I suspect all the docs in
> the db get read for reindexing - I would not have noticed it otherwise :-)
>
> My connection is crawling... maven is still downloading things needed for
> the latest from github. Will update later on what happened.
>
>
Its happening in the latest build also!

Following is the c-l output just before and after restart:

2010-04-14 17:25:11,189 INFO [spulz] View[digest=acv33jkyzefc7wb8djdc0cyw5]
now at update_seq 202639
2010-04-14 17:25:11,247 INFO [spulz] View[digest=5krdzy83b9nlrsl5pnru6eh3s]
now at update_seq 202639
2010-04-14 17:25:11,257 INFO [spulz] View[digest=bafqygi7kb41yj0wq5swblbx5]
now at update_seq 202639
^Cmano@acer:~/couchdb-lucene$ ./bin/run
2010-04-14 17:26:18,886 INFO [Main] Index output goes to:
/home/mano/couchdb-lucene-0.5-SNAPSHOT-latest/indexes
2010-04-14 17:26:18,910 INFO [Main] Accepting connections with
SelectChannelConnector@localhost:5985
2010-04-14 17:26:29,547 INFO [spulz] Indexing from update_seq 27200

I'm curious that its indexing from update_seq 27200 even though before
restart it was already at 202639!!

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 4:56 PM, Robert Newson <ro...@gmail.com>wrote:

> One thought. There is a delay between indexing and commit the index to
> disk. updates within that period (currently 60s seconds) will indeed
> be redone if you restart couchdb-lucene. I'll make the setting
> configurable but it will always be at least a few seconds, as this
> makes for much more efficient i/o. You can see search results before
> the data is committed to disk, so perhaps this explains what you're
> seeing?
>
>
Its not just a few documents that gets indexed. I suspect all the docs in
the db get read for reindexing - I would not have noticed it otherwise :-)

My connection is crawling... maven is still downloading things needed for
the latest from github. Will update later on what happened.

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
One thought. There is a delay between indexing and commit the index to
disk. updates within that period (currently 60s seconds) will indeed
be redone if you restart couchdb-lucene. I'll make the setting
configurable but it will always be at least a few seconds, as this
makes for much more efficient i/o. You can see search results before
the data is committed to disk, so perhaps this explains what you're
seeing?

B.

On Wed, Apr 14, 2010 at 12:12 PM, Manokaran K <ma...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 4:37 PM, Robert Newson <ro...@gmail.com>wrote:
>
>> Make sure you're up to date. the ini file no longer has a log entry,
>> the log output location is in the log4j.xml file. If you unzip a newly
>> built zip file ('mvn' will build one for you) it should log to a file
>> in the logs/ folder.
>>
>> I've verified that c-l does not start over when restarted with the
>> current code. It's possible there was a regression in the past that no
>> one else noticed (including me). Can you retry with the latest code
>> please?
>>
>
> I will.
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 4:37 PM, Robert Newson <ro...@gmail.com>wrote:

> Make sure you're up to date. the ini file no longer has a log entry,
> the log output location is in the log4j.xml file. If you unzip a newly
> built zip file ('mvn' will build one for you) it should log to a file
> in the logs/ folder.
>
> I've verified that c-l does not start over when restarted with the
> current code. It's possible there was a regression in the past that no
> one else noticed (including me). Can you retry with the latest code
> please?
>

I will.

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Robert Newson <ro...@gmail.com>.
Make sure you're up to date. the ini file no longer has a log entry,
the log output location is in the log4j.xml file. If you unzip a newly
built zip file ('mvn' will build one for you) it should log to a file
in the logs/ folder.

I've verified that c-l does not start over when restarted with the
current code. It's possible there was a regression in the past that no
one else noticed (including me). Can you retry with the latest code
please?

B.

On Wed, Apr 14, 2010 at 12:00 PM, Manokaran K <ma...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 4:15 PM, Manokaran K <ma...@gmail.com> wrote:
>
>>
>> On Wed, Apr 14, 2010 at 3:58 PM, Sebastian Cohnen <
>> sebastiancohnen@googlemail.com> wrote:
>>
>>> Hi,
>>>
>>> this is (of course) not to be expected. c-l uses the _chages API and
>>> checkpointing (using the update sequence number) to figure out, if there is
>>> something new to index - at least as far as I understood this. But since 0.5
>>> is not yet released, maybe you've found a bug :) Could you provide some more
>>> information on your setup, used design document, c-l logs?
>>>
>>>
>>>
>> My setup consists of about 150K docs to demo a web app. I'll prune it to a
>> much smaller number and give you the information. Any specific action you
>> would want me to try?
>>
>>
> Where's the c-l log file located? My c-l.ini file says couchdb-lucene.log
> but there's no such file in my system!
>
> thanks,
> mano
>

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 4:15 PM, Manokaran K <ma...@gmail.com> wrote:

>
> On Wed, Apr 14, 2010 at 3:58 PM, Sebastian Cohnen <
> sebastiancohnen@googlemail.com> wrote:
>
>> Hi,
>>
>> this is (of course) not to be expected. c-l uses the _chages API and
>> checkpointing (using the update sequence number) to figure out, if there is
>> something new to index - at least as far as I understood this. But since 0.5
>> is not yet released, maybe you've found a bug :) Could you provide some more
>> information on your setup, used design document, c-l logs?
>>
>>
>>
> My setup consists of about 150K docs to demo a web app. I'll prune it to a
> much smaller number and give you the information. Any specific action you
> would want me to try?
>
>
Where's the c-l log file located? My c-l.ini file says couchdb-lucene.log
but there's no such file in my system!

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Manokaran K <ma...@gmail.com>.
On Wed, Apr 14, 2010 at 3:58 PM, Sebastian Cohnen <
sebastiancohnen@googlemail.com> wrote:

> Hi,
>
> this is (of course) not to be expected. c-l uses the _chages API and
> checkpointing (using the update sequence number) to figure out, if there is
> something new to index - at least as far as I understood this. But since 0.5
> is not yet released, maybe you've found a bug :) Could you provide some more
> information on your setup, used design document, c-l logs?
>
>
>
My setup consists of about 150K docs to demo a web app. I'll prune it to a
much smaller number and give you the information. Any specific action you
would want me to try?

thanks,
mano

Re: couchdb-lucene reindexes when restarted

Posted by Sebastian Cohnen <se...@googlemail.com>.
Hi,

this is (of course) not to be expected. c-l uses the _chages API and checkpointing (using the update sequence number) to figure out, if there is something new to index - at least as far as I understood this. But since 0.5 is not yet released, maybe you've found a bug :) Could you provide some more information on your setup, used design document, c-l logs?


On 14.04.2010, at 11:15, Manokaran K wrote:

> Hi,
> 
> Hi, I'm using couchdb-lucene 0.5 with couchdb 0.10. It looks like whenever I
> restart couchdb or couchdb-lucene, lucene re-indexes the documents once
> again for that particular index function!! Is that expected behavior?
> 
> At IRC someone said its expected because docs might have changed! I thought
> couchdb-lucene would use etags or _changes api to test if the docs need
> reindexing!
> 
> Which is right?
> 
> thanks,
> mano
> 
> -- 
> Lord, give us the wisdom to utter words that are gentle and tender, for
> tomorrow we may have to eat them.
>   -Sen. Morris Udall