You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Paul Davis <pa...@gmail.com> on 2009/01/19 00:12:03 UTC

Update notifications including update sequence

Hey,

I'm working on this Lucene indexing stuff and I'm trying to write it
in such a way that I don't have to pound couchdb once per update. I
know that others have either gone every N updates or after a timeout,
but I'm not sure that's behavior that people would want in terms of
full text indexing.

The general update_notification outline is:

1. Receive notification with type == "updated"
2. while _all_docs_by_seq returns more data:
        index updates

The kicker is that it's possible that while we're doing the while
loop, we're receiving more update notifications. Naively we could just
queue them up and process them all which leads to us hitting couchdb
at least once per write to the db (which is teh suck) or we could
discard them all except for one and just restart the indexer when it
thinks it's finished etc etc.

After thinking about this, I thought that a simple way to actually
know if you need to start indexing again is if the notification sent
to update_notifications included the update_seq of the db. Then your
indexer that is already storing the current update_seq can just
compare if there's something new that needs to be worked on without
having to make an http request.

Then it just becomes "index till no new docs, then discard all update
notifications with an update_seq we've already indexed past.

I attached a patch that is extremely trivial, but I'd like to hear if
anyone has feed back on the merits or if there's just a better way
that I'm not thinking of.

Thanks,
Paul Davis

Re: Update notifications including update sequence

Posted by Paul Davis <pa...@gmail.com>.

On Sun, Jan 18, 2009 at 7:17 PM, Chris Anderson <jc...@gmail.com> wrote:
> On Sun, Jan 18, 2009 at 3:12 PM, Paul Davis <pa...@gmail.com> wrote:
>> I attached a patch that is extremely trivial, but I'd like to hear if
>> anyone has feed back on the merits or if there's just a better way
>> that I'm not thinking of.
>>
>
> I think this is a good way to do it (and useful for other things). The
> patch looks solid. I'd have to look more closely at the code to see if
> the update_seq's interactions with deferred commit need to be
> accounted for here.
>

Good call. I didn't think to hard about anything other than just
taking what was in the db record when the notification was sent.

>
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: Update notifications including update sequence

Posted by Chris Anderson <jc...@gmail.com>.

On Sun, Jan 18, 2009 at 3:12 PM, Paul Davis <pa...@gmail.com> wrote:
> I attached a patch that is extremely trivial, but I'd like to hear if
> anyone has feed back on the merits or if there's just a better way
> that I'm not thinking of.
>

I think this is a good way to do it (and useful for other things). The
patch looks solid. I'd have to look more closely at the code to see if
the update_seq's interactions with deferred commit need to be
accounted for here.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Update notifications including update sequence

Posted by Chris Anderson <jc...@gmail.com>.

On Sun, Jan 18, 2009 at 10:52 PM, Paul Davis
<pa...@gmail.com> wrote:
> On Mon, Jan 19, 2009 at 1:46 AM, Antony Blakey <an...@gmail.com> wrote:
>>
>> On 19/01/2009, at 3:51 PM, Paul Davis wrote:
>>
>>> There can be many _external processes for a single definition. So, not
>>> only are requests not serialized, they can be concurrent etc.
>>
>> Hmmm. I must be particularly thick today, because my reading of the code has
>> a single couch_external_manager creating and maintaining an instance of
>> couch_external_server *per* UrlName, with each couch_external_server
>> instance corresponding to a single invocation of the external process
>> backing that URL.
>>
>> Where am I going wrong?
>>
>
> Wow. I am the dumb one here. I was just checking it out again as well
> to pin down the spot you'd need. Turns out that everything I said
> about _external is dead wrong. Though, if it helps, the model I had in
> my head is definitely how view server processes work XD
>
> And now that I just got that into my head I'm scrapping the update
> notification side of my couchdb-lucene stuff and running it all from
> _external.
>
> Apologies for wasting everyone's time.
>

Not a waste of time. Perhaps on another thread, we should consider
enhancements to the db-update-notification process.



-- 
Chris Anderson
http://jchris.mfdz.com