You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Jan Lehnardt <ja...@apache.org> on 2008/04/10 23:32:21 UTC

Lazy Fulltext Search

Heya,
while thinking more about the fulltext implementation, I began to  
wonder why we don't model it after the view engine.

At the moment, we have an Indexer waiting for update notifications and  
polling CouchDB for changes and a separate mechanism to register a  
fulltext query Searcher, that looks up things in the index.

My proposed architectural change would be to trigger the Indexer from  
the Searcher module when a request comes in, just like views work.  
This would delay the creation of fulltext indexes until they are
actually needed.

The possible drawback though is, that when building the fulltext index  
is rather slow, old-style pre-calculation might be more feasible. View  
deal with that by requiring frequent requests (possibly cron-ed).

This is not a proposal or anything, just a thought I wanted to share  
with those who work on fulltext integration.

If you have any input on this, please let us know ;)

Cheers
Jan
--

RE: Lazy Fulltext Search

Posted by Brian Smith <br...@briansmith.org>.

Jan Lehnardt wrote:
> Views are built JIT as well, with intermediate  
> results cached, so that only latest changes need to be indexed. I  
> propose the exactly same thing for fulltext searching. The 
> same trade- 
> offs apply and the same drawbacks as well. What I don't 
> understand now  
> is why you say it is good for views and bad for fulltext searching.  
> The only architectural change I propose is that the indexer is  
> triggered by the searcher on demand instead of CouchDB's update  
> notification mechanism.

Re-computing indexes synchronously with updates is bad. But,
Re-computing them synchronously with queries is also bad. Why not
re-compute them immediately upon each update, but asynchronously, so
that all the re-indexing will usually happen between updates and
queries? In other words, re-compute (pre-compute) indexes using and and
all idle resources.

- Brian

Re: Lazy Fulltext Search

Posted by Noah Slater <ns...@apache.org>.

On Fri, Apr 11, 2008 at 01:24:50PM +0200, Jan Lehnardt wrote:
> The only architectural change I propose is that the indexer is triggered by
> the searcher on demand instead of CouchDB's update notification mechanism.

Aha, sorry for my misunderstanding. My only question now would be how much more
expensive is full text indexing compared to view generation and can we take that
performance hit or not?

> Still unclear?

No, thanks! :)

-- 
Noah Slater - The Apache Software Foundation <http://www.apache.org/>

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@apache.org>.

On Apr 11, 2008, at 13:11, Noah Slater wrote:
> On Fri, Apr 11, 2008 at 12:37:30PM +0200, Jan Lehnardt wrote:
>> The associated benefit is that you delay the costs of generation of
>> indexes until you actually need them.
>
> If you're generating indexes JIT, you can't really count them as  
> indexes any
> more, you're essentially doing regular non-indexed searching.
>
> I would have thought that for a database the trade-off you want to  
> make is one
> where you sacrifice time/resources in bulk so that queries are  
> lighting fast.
>
> If you move the indexing to query time you still have to expend  
> exactly the same
> time/resources as before and you have slowed down your query  
> response time
> significantly. For large collections of documents, indexing could  
> easily take
> hours to complete.
>
>>> My understanding is that the KEY element of CouchDB Wiews is that  
>>> they are
>>> generated in advance, and incrementally, before you use them.
>>
>> And why not use the same principle fot fulltext indexes?
>
> I thought this was the original plan for the full text search, that  
> the index
> was built in advance and incrementally before you use it. It sounds  
> to me like
> you're suggesting a departure away from this.
>
> Maybe I am getting confused.

Yeah I don't know what is going wrong here. My words might be not  
clear enough. Sorry. Views are built JIT as well, with intermediate  
results cached, so that only latest changes need to be indexed. I  
propose the exactly same thing for fulltext searching. The same trade- 
offs apply and the same drawbacks as well. What I don't understand now  
is why you say it is good for views and bad for fulltext searching.  
The only architectural change I propose is that the indexer is  
triggered by the searcher on demand instead of CouchDB's update  
notification mechanism.

Still unclear?

Cheers
Jan
--

Re: Lazy Fulltext Search

Posted by Noah Slater <ns...@apache.org>.

On Fri, Apr 11, 2008 at 12:37:30PM +0200, Jan Lehnardt wrote:
> The associated benefit is that you delay the costs of generation of
> indexes until you actually need them.

If you're generating indexes JIT, you can't really count them as indexes any
more, you're essentially doing regular non-indexed searching.

I would have thought that for a database the trade-off you want to make is one
where you sacrifice time/resources in bulk so that queries are lighting fast.

If you move the indexing to query time you still have to expend exactly the same
time/resources as before and you have slowed down your query response time
significantly. For large collections of documents, indexing could easily take
hours to complete.

>> My understanding is that the KEY element of CouchDB Wiews is that they are
>> generated in advance, and incrementally, before you use them.
>
> And why not use the same principle fot fulltext indexes?

I thought this was the original plan for the full text search, that the index
was built in advance and incrementally before you use it. It sounds to me like
you're suggesting a departure away from this.

Maybe I am getting confused.

-- 
Noah Slater - The Apache Software Foundation <http://www.apache.org/>

Re: Lazy Fulltext Search

Posted by Noah Slater <ns...@apache.org>.

On Fri, Apr 11, 2008 at 12:37:30PM +0200, Jan Lehnardt wrote:
> The associated benefit is that you delay the costs of generation of
> indexes until you actually need them.

If you're generating indexes JIT, you can't really count them as indexes any
more, you're essentially doing regular non-indexed searching.

I would have thought that for a database the trade-off you want to make is one
where you sacrifice time/resources in bulk so that queries are lighting fast.

If you move the indexing to query time you still have to expend exactly the same
time/resources as before and you have slowed down your query response time
significantly. For large collections of documents, indexing could easily take
hours to complete.

>> My understanding is that the KEY element of CouchDB Wiews is that they are
>> generated in advance, and incrementally, before you use them.
>
> And why not use the same principle fot fulltext indexes?

I thought this was the original plan for the full text search, that the index
was built in advance and incrementally before you use it. It sounds to me like
you're suggesting a departure away from this.

Maybe I am getting confused.

-- 
Noah Slater - The Apache Software Foundation <http://www.apache.org/>

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@apache.org>.

On Apr 11, 2008, at 12:23, Noah Slater wrote:
> On Thu, Apr 10, 2008 at 11:32:21PM +0200, Jan Lehnardt wrote:
>> My proposed architectural change would be to trigger the Indexer from
>> the Searcher module when a request comes in, just like views work.  
>> This
>> would delay the creation of fulltext indexes until they are
>> actually needed.
>
> I thought that the advantage of full text search systems is that you  
> can perform
> a lot of work up front in exchange for very fast queries later on.  
> This proposal
> would seem to make the trade-off in performance without the  
> associated benefit.

The associated benefit is that you delay the costs of generation of  
indexes until you actually need them.

>> The possible drawback though is, that when building the fulltext  
>> index
>> is rather slow, old-style pre-calculation might be more feasible.  
>> View
>> deal with that by requiring frequent requests (possibly cron-ed).
>
> My understanding is that the KEY element of CouchDB Wiews is that  
> they are
> generated in advance, and incrementally, before you use them.

And why not use the same principle fot fulltext indexes? Also, view  
indexes are not necessarily built in advance, though for online apps,  
you're likely to do that.

> What you're proposing for the full text indexing sounds like quite  
> the opposite
> to me, though I may be totally wrong.

I'm not proposing, I just thought I shared my idea here :) It is about  
trade-offs, you're right.

Cheers
Jan
--

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@apache.org>.

On Apr 11, 2008, at 12:23, Noah Slater wrote:
> On Thu, Apr 10, 2008 at 11:32:21PM +0200, Jan Lehnardt wrote:
>> My proposed architectural change would be to trigger the Indexer from
>> the Searcher module when a request comes in, just like views work.  
>> This
>> would delay the creation of fulltext indexes until they are
>> actually needed.
>
> I thought that the advantage of full text search systems is that you  
> can perform
> a lot of work up front in exchange for very fast queries later on.  
> This proposal
> would seem to make the trade-off in performance without the  
> associated benefit.

The associated benefit is that you delay the costs of generation of  
indexes until you actually need them.

>> The possible drawback though is, that when building the fulltext  
>> index
>> is rather slow, old-style pre-calculation might be more feasible.  
>> View
>> deal with that by requiring frequent requests (possibly cron-ed).
>
> My understanding is that the KEY element of CouchDB Wiews is that  
> they are
> generated in advance, and incrementally, before you use them.

And why not use the same principle fot fulltext indexes? Also, view  
indexes are not necessarily built in advance, though for online apps,  
you're likely to do that.

> What you're proposing for the full text indexing sounds like quite  
> the opposite
> to me, though I may be totally wrong.

I'm not proposing, I just thought I shared my idea here :) It is about  
trade-offs, you're right.

Cheers
Jan
--

Re: Lazy Fulltext Search

Posted by Noah Slater <ns...@apache.org>.

On Thu, Apr 10, 2008 at 11:32:21PM +0200, Jan Lehnardt wrote:
> My proposed architectural change would be to trigger the Indexer from
> the Searcher module when a request comes in, just like views work. This
> would delay the creation of fulltext indexes until they are
> actually needed.

I thought that the advantage of full text search systems is that you can perform
a lot of work up front in exchange for very fast queries later on. This proposal
would seem to make the trade-off in performance without the associated benefit.

> The possible drawback though is, that when building the fulltext index
> is rather slow, old-style pre-calculation might be more feasible. View
> deal with that by requiring frequent requests (possibly cron-ed).

My understanding is that the KEY element of CouchDB Wiews is that they are
generated in advance, and incrementally, before you use them.

What you're proposing for the full text indexing sounds like quite the opposite
to me, though I may be totally wrong.

-- 
Noah Slater - The Apache Software Foundation <http://www.apache.org/>

Re: Lazy Fulltext Search

Posted by Søren Hilmer <sh...@widetrail.dk>.

Hi Jan

It certainly would simplify configuration, allthough the
DbUpdateNotificationProcess setting ought to be retained as it is
potentially usefull for other stuff than indexing (can you have more than
one of these, setup?)

I am also worried about responsetimes for searching, potentially the
indexing can take considerable time. With the current approach indexing
can be done off peak hours and only searching is done at prime time.

Have fun
  Søren

-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail            Phone: +45 25481225
Pilevænget 41        Email: sh@widetrail.dk
DK-8961  Allingåbro  Web: www.widetrail.dk

On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
> Heya,
> while thinking more about the fulltext implementation, I began to
> wonder why we don't model it after the view engine.
>
> At the moment, we have an Indexer waiting for update notifications and
> polling CouchDB for changes and a separate mechanism to register a
> fulltext query Searcher, that looks up things in the index.
>
> My proposed architectural change would be to trigger the Indexer from
> the Searcher module when a request comes in, just like views work.
> This would delay the creation of fulltext indexes until they are
> actually needed.
>
> The possible drawback though is, that when building the fulltext index
> is rather slow, old-style pre-calculation might be more feasible. View
> deal with that by requiring frequent requests (possibly cron-ed).
>
> This is not a proposal or anything, just a thought I wanted to share
> with those who work on fulltext integration.
>
> If you have any input on this, please let us know ;)
>
> Cheers
> Jan
> --
>

Re: Lazy Fulltext Search

Posted by Søren Hilmer <sh...@widetrail.dk>.

Hi Jan

It certainly would simplify configuration, allthough the
DbUpdateNotificationProcess setting ought to be retained as it is
potentially usefull for other stuff than indexing (can you have more than
one of these, setup?)

I am also worried about responsetimes for searching, potentially the
indexing can take considerable time. With the current approach indexing
can be done off peak hours and only searching is done at prime time.

Have fun
  Søren

-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail            Phone: +45 25481225
Pilevænget 41        Email: sh@widetrail.dk
DK-8961  Allingåbro  Web: www.widetrail.dk

On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
> Heya,
> while thinking more about the fulltext implementation, I began to
> wonder why we don't model it after the view engine.
>
> At the moment, we have an Indexer waiting for update notifications and
> polling CouchDB for changes and a separate mechanism to register a
> fulltext query Searcher, that looks up things in the index.
>
> My proposed architectural change would be to trigger the Indexer from
> the Searcher module when a request comes in, just like views work.
> This would delay the creation of fulltext indexes until they are
> actually needed.
>
> The possible drawback though is, that when building the fulltext index
> is rather slow, old-style pre-calculation might be more feasible. View
> deal with that by requiring frequent requests (possibly cron-ed).
>
> This is not a proposal or anything, just a thought I wanted to share
> with those who work on fulltext integration.
>
> If you have any input on this, please let us know ;)
>
> Cheers
> Jan
> --
>

Re: Lazy Fulltext Search

Posted by Nils Adermann <na...@naderman.de>.

Hi,

Jan Lehnardt wrote:
> Heya Søren,
> On Apr 15, 2008, at 15:27, Soren Hilmer wrote:
>> I guess what all this boils down to is that:
>>
>> When a database changes, you need to re-index all the views in the
>> fulltextsearch design document.
>
> if you take this route. yes.
>
>> There are no way incremental changes can be made to the index as one 
>> document
>> change may potentially change more view results within the same view.
>> Right?
>
> Yup.
>
> Eventually, I think, we will be able to have CouchDB calculate the 
> intersection of all FT hits and a view index for you. So the FT 
> indexer will only need to index the whole DB and CouchDB filters out 
> all matching documents that are not in the requested view for you. For 
> now, you've got to do it yourself.
>
That's not even possible because a view (written in JS) could return 
data not directly in a document. Either combining information from 
multiple documents or generating new content based on some document 
values. You would never be able to search such content.

>> On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:
>>> On Apr 15, 2008, at 02:01, Nils Adermann wrote:
>>>> Hi,
>>>>
>>>> I agree with Søren that this is not necessarily a good idea. It is
>>>> not trivial for an indexer to figure out which view results changed.
>>>> One method to so is storing all indexed view results and then
>>>> comparing them to the updated view once the indexer is called. This
>>>> is a needless waste of resources. Updating the view index based on
>>>> changed documents is even more difficult. You would have to
>>>> recompute the view at least partially to find out which view results
>>>> changed. Given the reduce step this means that any number of
>>>> documents, including unchanged ones could be involved. This creates
>>>> a lot of work.
>>>
>>> Yeah, but it doesn't actually matter who does the work :) So we rather
>>> keep that out of CouchDB.
>>>
Err I wasn't saying the question is where it takes place. I was saying 
you have to do the work twice instead of just once if we follow your way.

>>>> I think the problem we face here is different usage patterns of
>>>> views. There are views which process a lot of data and which are
>>>> based on documents that are updated frequently.  But they might only
>>>> be read from infrequently. These views profit from JIT computation.
>>>> However many applications use views which are infrequently updated
>>>> but often queried or searched. Such views benefit from live
>>>> updating. If an application allows searching data it nearly always
>>>> means that the data will be read more frequently than it is updated.
>>>> So in conclusion both methods (JIT and live updates) make sense for
>>>> views. But search normally only needs the live update mechanism. I
>>>> believe it should become configurable whether a view is updated
>>>> immediately after a change or only after a query takes place.
>>>> Fulltext search would always work on views with immediate updates.
>>>> The indexer would be notified about the changed results. On views
>>>> which delay updates, search would only work if the fulltext search
>>>> provides a mechanism to compare the new view results to the old ones.
>>>
>>> Just query the view with ?count=0 to trigger an update after your
>>> inserts and you have the synchronous update behaviour.
>>>
If we really do things your way that'd mean the entire database and all 
searchable views need to be reindexed completely after every single 
update. You're creating a huge amount of useless work for the indexer.

Cheers
Nils

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@prima.de>.

Heya Søren,
On Apr 15, 2008, at 15:27, Soren Hilmer wrote:
> I guess what all this boils down to is that:
>
> When a database changes, you need to re-index all the views in the
> fulltextsearch design document.

if you take this route. yes.

> There are no way incremental changes can be made to the index as one  
> document
> change may potentially change more view results within the same view.
> Right?

Yup.

Eventually, I think, we will be able to have CouchDB calculate the  
intersection of all FT hits and a view index for you. So the FT  
indexer will only need to index the whole DB and CouchDB filters out  
all matching documents that are not in the requested view for you. For  
now, you've got to do it yourself.

Cheers
Jan
--



>
>
> --Søren
>
>
> On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:
>> On Apr 15, 2008, at 02:01, Nils Adermann wrote:
>>> Hi,
>>>
>>> I agree with Søren that this is not necessarily a good idea. It is
>>> not trivial for an indexer to figure out which view results changed.
>>> One method to so is storing all indexed view results and then
>>> comparing them to the updated view once the indexer is called. This
>>> is a needless waste of resources. Updating the view index based on
>>> changed documents is even more difficult. You would have to
>>> recompute the view at least partially to find out which view results
>>> changed. Given the reduce step this means that any number of
>>> documents, including unchanged ones could be involved. This creates
>>> a lot of work.
>>
>> Yeah, but it doesn't actually matter who does the work :) So we  
>> rather
>> keep that out of CouchDB.
>>
>>> I think the problem we face here is different usage patterns of
>>> views. There are views which process a lot of data and which are
>>> based on documents that are updated frequently.  But they might only
>>> be read from infrequently. These views profit from JIT computation.
>>> However many applications use views which are infrequently updated
>>> but often queried or searched. Such views benefit from live
>>> updating. If an application allows searching data it nearly always
>>> means that the data will be read more frequently than it is updated.
>>> So in conclusion both methods (JIT and live updates) make sense for
>>> views. But search normally only needs the live update mechanism. I
>>> believe it should become configurable whether a view is updated
>>> immediately after a change or only after a query takes place.
>>> Fulltext search would always work on views with immediate updates.
>>> The indexer would be notified about the changed results. On views
>>> which delay updates, search would only work if the fulltext search
>>> provides a mechanism to compare the new view results to the old  
>>> ones.
>>
>> Just query the view with ?count=0 to trigger an update after your
>> inserts and you have the synchronous update behaviour.
>>
>>> Cheers
>>> Nils
>>>
>>> Jan Lehnardt wrote:
>>>> On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
>>>>> Hi
>>>>>
>>>>> Have you read Chris' response about letting the view engine call
>>>>> the indexer,
>>>>> as it has the information needed for the indexer? As I understand
>>>>> the idea,
>>>>> it will essentially keep the fulltext indexer and the views in  
>>>>> sync.
>>>>>
>>>>> I like this idea and I believe the code for the indexer would be
>>>>> much simpler
>>>>> and efficient.
>>>>>
>>>>> Also as the shift goes towards indexing views and not documents,
>>>>> it makes
>>>>> sense that it is the View engine that triggers the indexer, right?
>>>>
>>>> The only problem here is that views are changed, when they are
>>>> being queried and not when documents are added. So you could end up
>>>> with a lot of not-indexed data because your view hasn't been
>>>> queried. That can be worked around, but I don't think it makes
>>>> things any easier :)
>>>>
>>>> The design of the update notification is intentionally simple. We
>>>> expect the clients (the Indexer in this case) to be smart. We
>>>> believe that this makes the server code is more robust in that way.
>>>>
>>>>> I have to study the View engine, if I am to provide any code for
>>>>> this, though
>>>>> (provided consensus blows in this direction).
>>>>>
>>>>> Have fun
>>>>> Søren
>>>>>
>>>>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
>>>>>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
>>>>>>> Hi Jan
>>>>>>>
>>>>>>> It certainly would simplify configuration, allthough the
>>>>>>> DbUpdateNotificationProcess setting ought to be retained as it  
>>>>>>> is
>>>>>>> potentially usefull for other stuff than indexing (can you have
>>>>>>> more
>>>>>>> than
>>>>>>> one of these, setup?)
>>>>>>
>>>>>> No, the update searcher will stay! :-)
>>>>>>
>>>>>>> I am also worried about responsetimes for searching, potentially
>>>>>>> the
>>>>>>> indexing can take considerable time. With the current approach
>>>>>>> indexing
>>>>>>> can be done off peak hours and only searching is done at prime
>>>>>>> time.
>>>>>>
>>>>>> Right, if you want to be conservative with resources, you might
>>>>>> want
>>>>>> togo
>>>>>> with my approach at the expense of possibly higher response times
>>>>>> the
>>>>>> first time things are searched for (as it is with views). I just
>>>>>> wanted to make
>>>>>> available my idea that fulltext indexing could be modelled after
>>>>>> how
>>>>>> views
>>>>>> work, in case this is useful for a specific scenario.
>>>>>>
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>>
>>>>>>> Have fun
>>>>>>> Søren
>>>>>>> --
>>>>>>> Søren Hilmer, M.Sc., M.Crypt.
>>>>>>> wideTrail            Phone: +45 25481225
>>>>>>> Pilevænget 41        Email: sh@widetrail.dk
>>>>>>> DK-8961  Allingåbro  Web: www.widetrail.dk
>>>>>>>
>>>>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
>>>>>>>> Heya,
>>>>>>>> while thinking more about the fulltext implementation, I  
>>>>>>>> began to
>>>>>>>> wonder why we don't model it after the view engine.
>>>>>>>>
>>>>>>>> At the moment, we have an Indexer waiting for update
>>>>>>>> notifications
>>>>>>>> and
>>>>>>>> polling CouchDB for changes and a separate mechanism to
>>>>>>>> register a
>>>>>>>> fulltext query Searcher, that looks up things in the index.
>>>>>>>>
>>>>>>>> My proposed architectural change would be to trigger the
>>>>>>>> Indexer from
>>>>>>>> the Searcher module when a request comes in, just like views
>>>>>>>> work.
>>>>>>>> This would delay the creation of fulltext indexes until they  
>>>>>>>> are
>>>>>>>> actually needed.
>>>>>>>>
>>>>>>>> The possible drawback though is, that when building the  
>>>>>>>> fulltext
>>>>>>>> index
>>>>>>>> is rather slow, old-style pre-calculation might be more  
>>>>>>>> feasible.
>>>>>>>> View
>>>>>>>> deal with that by requiring frequent requests (possibly cron- 
>>>>>>>> ed).
>>>>>>>>
>>>>>>>> This is not a proposal or anything, just a thought I wanted to
>>>>>>>> share
>>>>>>>> with those who work on fulltext integration.
>>>>>>>>
>>>>>>>> If you have any input on this, please let us know ;)
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Jan
>>>>>>>> --
>>>>>
>>>>> --
>>>>> Søren Hilmer, M.Sc., M.Crypt.
>>>>> wideTrail            Phone:    +45 25481225
>>>>> Pilevænget 41        Email:    sh@widetrail.dk
>>>>> DK-8961  Allingåbro    Web:    www.widetrail.dk
>
>
>
> -- 
> Søren Hilmer, M.Sc., M.Crypt.
> wideTrail                       Phone:  +45 25481225
> Pilevænget 41           Email:  sh@widetrail.dk
> DK-8961  Allingåbro     Web:    www.widetrail.dk
>

Re: Lazy Fulltext Search

Posted by Soren Hilmer <sh...@widetrail.dk>.

I guess what all this boils down to is that:

When a database changes, you need to re-index all the views in the 
fulltextsearch design document. 
There are no way incremental changes can be made to the index as one document 
change may potentially change more view results within the same view.

Right?

--Søren


On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:
> On Apr 15, 2008, at 02:01, Nils Adermann wrote:
> > Hi,
> >
> > I agree with Søren that this is not necessarily a good idea. It is
> > not trivial for an indexer to figure out which view results changed.
> > One method to so is storing all indexed view results and then
> > comparing them to the updated view once the indexer is called. This
> > is a needless waste of resources. Updating the view index based on
> > changed documents is even more difficult. You would have to
> > recompute the view at least partially to find out which view results
> > changed. Given the reduce step this means that any number of
> > documents, including unchanged ones could be involved. This creates
> > a lot of work.
>
> Yeah, but it doesn't actually matter who does the work :) So we rather
> keep that out of CouchDB.
>
> > I think the problem we face here is different usage patterns of
> > views. There are views which process a lot of data and which are
> > based on documents that are updated frequently.  But they might only
> > be read from infrequently. These views profit from JIT computation.
> > However many applications use views which are infrequently updated
> > but often queried or searched. Such views benefit from live
> > updating. If an application allows searching data it nearly always
> > means that the data will be read more frequently than it is updated.
> > So in conclusion both methods (JIT and live updates) make sense for
> > views. But search normally only needs the live update mechanism. I
> > believe it should become configurable whether a view is updated
> > immediately after a change or only after a query takes place.
> > Fulltext search would always work on views with immediate updates.
> > The indexer would be notified about the changed results. On views
> > which delay updates, search would only work if the fulltext search
> > provides a mechanism to compare the new view results to the old ones.
>
> Just query the view with ?count=0 to trigger an update after your
> inserts and you have the synchronous update behaviour.
>
> > Cheers
> > Nils
> >
> > Jan Lehnardt wrote:
> >> On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
> >>> Hi
> >>>
> >>> Have you read Chris' response about letting the view engine call
> >>> the indexer,
> >>> as it has the information needed for the indexer? As I understand
> >>> the idea,
> >>> it will essentially keep the fulltext indexer and the views in sync.
> >>>
> >>> I like this idea and I believe the code for the indexer would be
> >>> much simpler
> >>> and efficient.
> >>>
> >>> Also as the shift goes towards indexing views and not documents,
> >>> it makes
> >>> sense that it is the View engine that triggers the indexer, right?
> >>
> >> The only problem here is that views are changed, when they are
> >> being queried and not when documents are added. So you could end up
> >> with a lot of not-indexed data because your view hasn't been
> >> queried. That can be worked around, but I don't think it makes
> >> things any easier :)
> >>
> >> The design of the update notification is intentionally simple. We
> >> expect the clients (the Indexer in this case) to be smart. We
> >> believe that this makes the server code is more robust in that way.
> >>
> >>> I have to study the View engine, if I am to provide any code for
> >>> this, though
> >>> (provided consensus blows in this direction).
> >>>
> >>> Have fun
> >>>  Søren
> >>>
> >>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
> >>>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
> >>>>> Hi Jan
> >>>>>
> >>>>> It certainly would simplify configuration, allthough the
> >>>>> DbUpdateNotificationProcess setting ought to be retained as it is
> >>>>> potentially usefull for other stuff than indexing (can you have
> >>>>> more
> >>>>> than
> >>>>> one of these, setup?)
> >>>>
> >>>> No, the update searcher will stay! :-)
> >>>>
> >>>>> I am also worried about responsetimes for searching, potentially
> >>>>> the
> >>>>> indexing can take considerable time. With the current approach
> >>>>> indexing
> >>>>> can be done off peak hours and only searching is done at prime
> >>>>> time.
> >>>>
> >>>> Right, if you want to be conservative with resources, you might
> >>>> want
> >>>> togo
> >>>> with my approach at the expense of possibly higher response times
> >>>> the
> >>>> first time things are searched for (as it is with views). I just
> >>>> wanted to make
> >>>> available my idea that fulltext indexing could be modelled after
> >>>> how
> >>>> views
> >>>> work, in case this is useful for a specific scenario.
> >>>>
> >>>> Cheers
> >>>> Jan
> >>>> --
> >>>>
> >>>>> Have fun
> >>>>> Søren
> >>>>> --
> >>>>> Søren Hilmer, M.Sc., M.Crypt.
> >>>>> wideTrail            Phone: +45 25481225
> >>>>> Pilevænget 41        Email: sh@widetrail.dk
> >>>>> DK-8961  Allingåbro  Web: www.widetrail.dk
> >>>>>
> >>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
> >>>>>> Heya,
> >>>>>> while thinking more about the fulltext implementation, I began to
> >>>>>> wonder why we don't model it after the view engine.
> >>>>>>
> >>>>>> At the moment, we have an Indexer waiting for update
> >>>>>> notifications
> >>>>>> and
> >>>>>> polling CouchDB for changes and a separate mechanism to
> >>>>>> register a
> >>>>>> fulltext query Searcher, that looks up things in the index.
> >>>>>>
> >>>>>> My proposed architectural change would be to trigger the
> >>>>>> Indexer from
> >>>>>> the Searcher module when a request comes in, just like views
> >>>>>> work.
> >>>>>> This would delay the creation of fulltext indexes until they are
> >>>>>> actually needed.
> >>>>>>
> >>>>>> The possible drawback though is, that when building the fulltext
> >>>>>> index
> >>>>>> is rather slow, old-style pre-calculation might be more feasible.
> >>>>>> View
> >>>>>> deal with that by requiring frequent requests (possibly cron-ed).
> >>>>>>
> >>>>>> This is not a proposal or anything, just a thought I wanted to
> >>>>>> share
> >>>>>> with those who work on fulltext integration.
> >>>>>>
> >>>>>> If you have any input on this, please let us know ;)
> >>>>>>
> >>>>>> Cheers
> >>>>>> Jan
> >>>>>> --
> >>>
> >>> --
> >>> Søren Hilmer, M.Sc., M.Crypt.
> >>> wideTrail            Phone:    +45 25481225
> >>> Pilevænget 41        Email:    sh@widetrail.dk
> >>> DK-8961  Allingåbro    Web:    www.widetrail.dk



-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail                       Phone:  +45 25481225
Pilevænget 41           Email:  sh@widetrail.dk
DK-8961  Allingåbro     Web:    www.widetrail.dk

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@apache.org>.

On Apr 15, 2008, at 02:01, Nils Adermann wrote:
> Hi,
>
> I agree with Søren that this is not necessarily a good idea. It is  
> not trivial for an indexer to figure out which view results changed.  
> One method to so is storing all indexed view results and then  
> comparing them to the updated view once the indexer is called. This  
> is a needless waste of resources. Updating the view index based on  
> changed documents is even more difficult. You would have to  
> recompute the view at least partially to find out which view results  
> changed. Given the reduce step this means that any number of  
> documents, including unchanged ones could be involved. This creates  
> a lot of work.

Yeah, but it doesn't actually matter who does the work :) So we rather  
keep that out of CouchDB.


> I think the problem we face here is different usage patterns of  
> views. There are views which process a lot of data and which are  
> based on documents that are updated frequently.  But they might only  
> be read from infrequently. These views profit from JIT computation.  
> However many applications use views which are infrequently updated  
> but often queried or searched. Such views benefit from live  
> updating. If an application allows searching data it nearly always  
> means that the data will be read more frequently than it is updated.  
> So in conclusion both methods (JIT and live updates) make sense for  
> views. But search normally only needs the live update mechanism. I  
> believe it should become configurable whether a view is updated  
> immediately after a change or only after a query takes place.  
> Fulltext search would always work on views with immediate updates.  
> The indexer would be notified about the changed results. On views  
> which delay updates, search would only work if the fulltext search  
> provides a mechanism to compare the new view results to the old ones.

Just query the view with ?count=0 to trigger an update after your  
inserts and you have the synchronous update behaviour.

>
>
> Cheers
> Nils
>
> Jan Lehnardt wrote:
>>
>> On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
>>> Hi
>>>
>>> Have you read Chris' response about letting the view engine call  
>>> the indexer,
>>> as it has the information needed for the indexer? As I understand  
>>> the idea,
>>> it will essentially keep the fulltext indexer and the views in sync.
>>>
>>> I like this idea and I believe the code for the indexer would be  
>>> much simpler
>>> and efficient.
>>>
>>> Also as the shift goes towards indexing views and not documents,  
>>> it makes
>>> sense that it is the View engine that triggers the indexer, right?
>>
>> The only problem here is that views are changed, when they are  
>> being queried and not when documents are added. So you could end up  
>> with a lot of not-indexed data because your view hasn't been  
>> queried. That can be worked around, but I don't think it makes  
>> things any easier :)
>>
>> The design of the update notification is intentionally simple. We  
>> expect the clients (the Indexer in this case) to be smart. We  
>> believe that this makes the server code is more robust in that way.
>>
>>
>>> I have to study the View engine, if I am to provide any code for  
>>> this, though
>>> (provided consensus blows in this direction).
>>>
>>> Have fun
>>>  Søren
>>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
>>>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
>>>>> Hi Jan
>>>>>
>>>>> It certainly would simplify configuration, allthough the
>>>>> DbUpdateNotificationProcess setting ought to be retained as it is
>>>>> potentially usefull for other stuff than indexing (can you have  
>>>>> more
>>>>> than
>>>>> one of these, setup?)
>>>>
>>>> No, the update searcher will stay! :-)
>>>>
>>>>> I am also worried about responsetimes for searching, potentially  
>>>>> the
>>>>> indexing can take considerable time. With the current approach
>>>>> indexing
>>>>> can be done off peak hours and only searching is done at prime  
>>>>> time.
>>>>
>>>> Right, if you want to be conservative with resources, you might  
>>>> want
>>>> togo
>>>> with my approach at the expense of possibly higher response times  
>>>> the
>>>> first time things are searched for (as it is with views). I just
>>>> wanted to make
>>>> available my idea that fulltext indexing could be modelled after  
>>>> how
>>>> views
>>>> work, in case this is useful for a specific scenario.
>>>>
>>>> Cheers
>>>> Jan
>>>> -- 
>>>>
>>>>> Have fun
>>>>> Søren
>>>>> -- 
>>>>> Søren Hilmer, M.Sc., M.Crypt.
>>>>> wideTrail            Phone: +45 25481225
>>>>> Pilevænget 41        Email: sh@widetrail.dk
>>>>> DK-8961  Allingåbro  Web: www.widetrail.dk
>>>>>
>>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
>>>>>> Heya,
>>>>>> while thinking more about the fulltext implementation, I began to
>>>>>> wonder why we don't model it after the view engine.
>>>>>>
>>>>>> At the moment, we have an Indexer waiting for update  
>>>>>> notifications
>>>>>> and
>>>>>> polling CouchDB for changes and a separate mechanism to  
>>>>>> register a
>>>>>> fulltext query Searcher, that looks up things in the index.
>>>>>>
>>>>>> My proposed architectural change would be to trigger the  
>>>>>> Indexer from
>>>>>> the Searcher module when a request comes in, just like views  
>>>>>> work.
>>>>>> This would delay the creation of fulltext indexes until they are
>>>>>> actually needed.
>>>>>>
>>>>>> The possible drawback though is, that when building the fulltext
>>>>>> index
>>>>>> is rather slow, old-style pre-calculation might be more feasible.
>>>>>> View
>>>>>> deal with that by requiring frequent requests (possibly cron-ed).
>>>>>>
>>>>>> This is not a proposal or anything, just a thought I wanted to  
>>>>>> share
>>>>>> with those who work on fulltext integration.
>>>>>>
>>>>>> If you have any input on this, please let us know ;)
>>>>>>
>>>>>> Cheers
>>>>>> Jan
>>>>>> -- 
>>>
>>> -- 
>>> Søren Hilmer, M.Sc., M.Crypt.
>>> wideTrail            Phone:    +45 25481225
>>> Pilevænget 41        Email:    sh@widetrail.dk
>>> DK-8961  Allingåbro    Web:    www.widetrail.dk
>>>
>>
>
>

Re: Lazy Fulltext Search

Posted by Nils Adermann <na...@naderman.de>.

Nils Adermann wrote:
> I agree with Søren that this is not necessarily a good idea.
Just to make this more clear: With this sentence I meant to say that I 
believe giving the indexer only a little amount of data is a bad idea.

Re: Lazy Fulltext Search

Posted by Nils Adermann <na...@naderman.de>.

Hi,

I agree with Søren that this is not necessarily a good idea. It is not 
trivial for an indexer to figure out which view results changed. One 
method to so is storing all indexed view results and then comparing them 
to the updated view once the indexer is called. This is a needless waste 
of resources. Updating the view index based on changed documents is even 
more difficult. You would have to recompute the view at least partially 
to find out which view results changed. Given the reduce step this means 
that any number of documents, including unchanged ones could be 
involved. This creates a lot of work.

I think the problem we face here is different usage patterns of views. 
There are views which process a lot of data and which are based on 
documents that are updated frequently.  But they might only be read from 
infrequently. These views profit from JIT computation. However many 
applications use views which are infrequently updated but often queried 
or searched. Such views benefit from live updating. If an application 
allows searching data it nearly always means that the data will be read 
more frequently than it is updated. So in conclusion both methods (JIT 
and live updates) make sense for views. But search normally only needs 
the live update mechanism. I believe it should become configurable 
whether a view is updated immediately after a change or only after a 
query takes place. Fulltext search would always work on views with 
immediate updates. The indexer would be notified about the changed 
results. On views which delay updates, search would only work if the 
fulltext search provides a mechanism to compare the new view results to 
the old ones.

Cheers
Nils

Jan Lehnardt wrote:
>
> On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
>> Hi
>>
>> Have you read Chris' response about letting the view engine call the 
>> indexer,
>> as it has the information needed for the indexer? As I understand the 
>> idea,
>> it will essentially keep the fulltext indexer and the views in sync.
>>
>> I like this idea and I believe the code for the indexer would be much 
>> simpler
>> and efficient.
>>
>> Also as the shift goes towards indexing views and not documents, it 
>> makes
>> sense that it is the View engine that triggers the indexer, right?
>
> The only problem here is that views are changed, when they are being 
> queried and not when documents are added. So you could end up with a 
> lot of not-indexed data because your view hasn't been queried. That 
> can be worked around, but I don't think it makes things any easier :)
>
> The design of the update notification is intentionally simple. We 
> expect the clients (the Indexer in this case) to be smart. We believe 
> that this makes the server code is more robust in that way.
>
>
>> I have to study the View engine, if I am to provide any code for 
>> this, though
>> (provided consensus blows in this direction).
>>
>> Have fun
>>   Søren
>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
>>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
>>>> Hi Jan
>>>>
>>>> It certainly would simplify configuration, allthough the
>>>> DbUpdateNotificationProcess setting ought to be retained as it is
>>>> potentially usefull for other stuff than indexing (can you have more
>>>> than
>>>> one of these, setup?)
>>>
>>> No, the update searcher will stay! :-)
>>>
>>>> I am also worried about responsetimes for searching, potentially the
>>>> indexing can take considerable time. With the current approach
>>>> indexing
>>>> can be done off peak hours and only searching is done at prime time.
>>>
>>> Right, if you want to be conservative with resources, you might want
>>> togo
>>> with my approach at the expense of possibly higher response times the
>>> first time things are searched for (as it is with views). I just
>>> wanted to make
>>> available my idea that fulltext indexing could be modelled after how
>>> views
>>> work, in case this is useful for a specific scenario.
>>>
>>> Cheers
>>> Jan
>>> -- 
>>>
>>>> Have fun
>>>> Søren
>>>> -- 
>>>> Søren Hilmer, M.Sc., M.Crypt.
>>>> wideTrail            Phone: +45 25481225
>>>> Pilevænget 41        Email: sh@widetrail.dk
>>>> DK-8961  Allingåbro  Web: www.widetrail.dk
>>>>
>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
>>>>> Heya,
>>>>> while thinking more about the fulltext implementation, I began to
>>>>> wonder why we don't model it after the view engine.
>>>>>
>>>>> At the moment, we have an Indexer waiting for update notifications
>>>>> and
>>>>> polling CouchDB for changes and a separate mechanism to register a
>>>>> fulltext query Searcher, that looks up things in the index.
>>>>>
>>>>> My proposed architectural change would be to trigger the Indexer from
>>>>> the Searcher module when a request comes in, just like views work.
>>>>> This would delay the creation of fulltext indexes until they are
>>>>> actually needed.
>>>>>
>>>>> The possible drawback though is, that when building the fulltext
>>>>> index
>>>>> is rather slow, old-style pre-calculation might be more feasible.
>>>>> View
>>>>> deal with that by requiring frequent requests (possibly cron-ed).
>>>>>
>>>>> This is not a proposal or anything, just a thought I wanted to share
>>>>> with those who work on fulltext integration.
>>>>>
>>>>> If you have any input on this, please let us know ;)
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> -- 
>>
>> -- 
>> Søren Hilmer, M.Sc., M.Crypt.
>> wideTrail            Phone:    +45 25481225
>> Pilevænget 41        Email:    sh@widetrail.dk
>> DK-8961  Allingåbro    Web:    www.widetrail.dk
>>
>

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@apache.org>.

On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
> Hi
>
> Have you read Chris' response about letting the view engine call the  
> indexer,
> as it has the information needed for the indexer? As I understand  
> the idea,
> it will essentially keep the fulltext indexer and the views in sync.
>
> I like this idea and I believe the code for the indexer would be  
> much simpler
> and efficient.
>
> Also as the shift goes towards indexing views and not documents, it  
> makes
> sense that it is the View engine that triggers the indexer, right?

The only problem here is that views are changed, when they are being  
queried and not when documents are added. So you could end up with a  
lot of not-indexed data because your view hasn't been queried. That  
can be worked around, but I don't think it makes things any easier :)

The design of the update notification is intentionally simple. We  
expect the clients (the Indexer in this case) to be smart. We believe  
that this makes the server code is more robust in that way.


> I have to study the View engine, if I am to provide any code for  
> this, though
> (provided consensus blows in this direction).
>
> Have fun
>   Søren
> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
>>> Hi Jan
>>>
>>> It certainly would simplify configuration, allthough the
>>> DbUpdateNotificationProcess setting ought to be retained as it is
>>> potentially usefull for other stuff than indexing (can you have more
>>> than
>>> one of these, setup?)
>>
>> No, the update searcher will stay! :-)
>>
>>> I am also worried about responsetimes for searching, potentially the
>>> indexing can take considerable time. With the current approach
>>> indexing
>>> can be done off peak hours and only searching is done at prime time.
>>
>> Right, if you want to be conservative with resources, you might want
>> togo
>> with my approach at the expense of possibly higher response times the
>> first time things are searched for (as it is with views). I just
>> wanted to make
>> available my idea that fulltext indexing could be modelled after how
>> views
>> work, in case this is useful for a specific scenario.
>>
>> Cheers
>> Jan
>> --
>>
>>> Have fun
>>> Søren
>>> --
>>> Søren Hilmer, M.Sc., M.Crypt.
>>> wideTrail            Phone: +45 25481225
>>> Pilevænget 41        Email: sh@widetrail.dk
>>> DK-8961  Allingåbro  Web: www.widetrail.dk
>>>
>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
>>>> Heya,
>>>> while thinking more about the fulltext implementation, I began to
>>>> wonder why we don't model it after the view engine.
>>>>
>>>> At the moment, we have an Indexer waiting for update notifications
>>>> and
>>>> polling CouchDB for changes and a separate mechanism to register a
>>>> fulltext query Searcher, that looks up things in the index.
>>>>
>>>> My proposed architectural change would be to trigger the Indexer  
>>>> from
>>>> the Searcher module when a request comes in, just like views work.
>>>> This would delay the creation of fulltext indexes until they are
>>>> actually needed.
>>>>
>>>> The possible drawback though is, that when building the fulltext
>>>> index
>>>> is rather slow, old-style pre-calculation might be more feasible.
>>>> View
>>>> deal with that by requiring frequent requests (possibly cron-ed).
>>>>
>>>> This is not a proposal or anything, just a thought I wanted to  
>>>> share
>>>> with those who work on fulltext integration.
>>>>
>>>> If you have any input on this, please let us know ;)
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>
> -- 
> Søren Hilmer, M.Sc., M.Crypt.
> wideTrail			Phone:	+45 25481225
> Pilevænget 41		Email:	sh@widetrail.dk
> DK-8961  Allingåbro	Web:	www.widetrail.dk
>

Re: Lazy Fulltext Search

Posted by Søren Hilmer <sh...@widetrail.dk>.

Hi

Have you read Chris' response about letting the view engine call the indexer, 
as it has the information needed for the indexer? As I understand the idea, 
it will essentially keep the fulltext indexer and the views in sync.

I like this idea and I believe the code for the indexer would be much simpler 
and efficient.

Also as the shift goes towards indexing views and not documents, it makes 
sense that it is the View engine that triggers the indexer, right?

I have to study the View engine, if I am to provide any code for this, though 
(provided consensus blows in this direction).

Have fun
   Søren
On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
> > Hi Jan
> >
> > It certainly would simplify configuration, allthough the
> > DbUpdateNotificationProcess setting ought to be retained as it is
> > potentially usefull for other stuff than indexing (can you have more
> > than
> > one of these, setup?)
>
> No, the update searcher will stay! :-)
>
> > I am also worried about responsetimes for searching, potentially the
> > indexing can take considerable time. With the current approach
> > indexing
> > can be done off peak hours and only searching is done at prime time.
>
> Right, if you want to be conservative with resources, you might want
> togo
> with my approach at the expense of possibly higher response times the
> first time things are searched for (as it is with views). I just
> wanted to make
> available my idea that fulltext indexing could be modelled after how
> views
> work, in case this is useful for a specific scenario.
>
> Cheers
> Jan
> --
>
> > Have fun
> >  Søren
> > --
> > Søren Hilmer, M.Sc., M.Crypt.
> > wideTrail            Phone: +45 25481225
> > Pilevænget 41        Email: sh@widetrail.dk
> > DK-8961  Allingåbro  Web: www.widetrail.dk
> >
> > On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
> >> Heya,
> >> while thinking more about the fulltext implementation, I began to
> >> wonder why we don't model it after the view engine.
> >>
> >> At the moment, we have an Indexer waiting for update notifications
> >> and
> >> polling CouchDB for changes and a separate mechanism to register a
> >> fulltext query Searcher, that looks up things in the index.
> >>
> >> My proposed architectural change would be to trigger the Indexer from
> >> the Searcher module when a request comes in, just like views work.
> >> This would delay the creation of fulltext indexes until they are
> >> actually needed.
> >>
> >> The possible drawback though is, that when building the fulltext
> >> index
> >> is rather slow, old-style pre-calculation might be more feasible.
> >> View
> >> deal with that by requiring frequent requests (possibly cron-ed).
> >>
> >> This is not a proposal or anything, just a thought I wanted to share
> >> with those who work on fulltext integration.
> >>
> >> If you have any input on this, please let us know ;)
> >>
> >> Cheers
> >> Jan
> >> --

-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail			Phone:	+45 25481225
Pilevænget 41		Email:	sh@widetrail.dk
DK-8961  Allingåbro	Web:	www.widetrail.dk

Re: Lazy Fulltext Search

Posted by Jan Lehnardt <ja...@apache.org>.

On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
> Hi Jan
>
> It certainly would simplify configuration, allthough the
> DbUpdateNotificationProcess setting ought to be retained as it is
> potentially usefull for other stuff than indexing (can you have more  
> than
> one of these, setup?)

No, the update searcher will stay! :-)


> I am also worried about responsetimes for searching, potentially the
> indexing can take considerable time. With the current approach  
> indexing
> can be done off peak hours and only searching is done at prime time.

Right, if you want to be conservative with resources, you might want  
togo
with my approach at the expense of possibly higher response times the
first time things are searched for (as it is with views). I just  
wanted to make
available my idea that fulltext indexing could be modelled after how  
views
work, in case this is useful for a specific scenario.

Cheers
Jan
--



>
>
> Have fun
>  Søren
> -- 
> Søren Hilmer, M.Sc., M.Crypt.
> wideTrail            Phone: +45 25481225
> Pilevænget 41        Email: sh@widetrail.dk
> DK-8961  Allingåbro  Web: www.widetrail.dk
>
> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
>> Heya,
>> while thinking more about the fulltext implementation, I began to
>> wonder why we don't model it after the view engine.
>>
>> At the moment, we have an Indexer waiting for update notifications  
>> and
>> polling CouchDB for changes and a separate mechanism to register a
>> fulltext query Searcher, that looks up things in the index.
>>
>> My proposed architectural change would be to trigger the Indexer from
>> the Searcher module when a request comes in, just like views work.
>> This would delay the creation of fulltext indexes until they are
>> actually needed.
>>
>> The possible drawback though is, that when building the fulltext  
>> index
>> is rather slow, old-style pre-calculation might be more feasible.  
>> View
>> deal with that by requiring frequent requests (possibly cron-ed).
>>
>> This is not a proposal or anything, just a thought I wanted to share
>> with those who work on fulltext integration.
>>
>> If you have any input on this, please let us know ;)
>>
>> Cheers
>> Jan
>> --
>>
>
>
>

Re: Lazy Fulltext Search

Posted by Søren Hilmer <sh...@widetrail.dk>.

Hi Jan

It certainly would simplify configuration, allthough the
DbUpdateNotificationProcess setting ought to be retained as it is
potentially usefull for other stuff than indexing (can you have more than
one of these, setup?)

I am also worried about responsetimes for searching, potentially the
indexing can take considerable time. With the current approach indexing
can be done off peak hours and only searching is done at prime time.

Have fun
  Søren
-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail            Phone: +45 25481225
Pilevænget 41        Email: sh@widetrail.dk
DK-8961  Allingåbro  Web: www.widetrail.dk

On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
> Heya,
> while thinking more about the fulltext implementation, I began to
> wonder why we don't model it after the view engine.
>
> At the moment, we have an Indexer waiting for update notifications and
> polling CouchDB for changes and a separate mechanism to register a
> fulltext query Searcher, that looks up things in the index.
>
> My proposed architectural change would be to trigger the Indexer from
> the Searcher module when a request comes in, just like views work.
> This would delay the creation of fulltext indexes until they are
> actually needed.
>
> The possible drawback though is, that when building the fulltext index
> is rather slow, old-style pre-calculation might be more feasible. View
> deal with that by requiring frequent requests (possibly cron-ed).
>
> This is not a proposal or anything, just a thought I wanted to share
> with those who work on fulltext integration.
>
> If you have any input on this, please let us know ;)
>
> Cheers
> Jan
> --
>

Re: Lazy Fulltext Search

Posted by Noah Slater <ns...@apache.org>.

On Thu, Apr 10, 2008 at 11:32:21PM +0200, Jan Lehnardt wrote:
> My proposed architectural change would be to trigger the Indexer from
> the Searcher module when a request comes in, just like views work. This
> would delay the creation of fulltext indexes until they are
> actually needed.

I thought that the advantage of full text search systems is that you can perform
a lot of work up front in exchange for very fast queries later on. This proposal
would seem to make the trade-off in performance without the associated benefit.

> The possible drawback though is, that when building the fulltext index
> is rather slow, old-style pre-calculation might be more feasible. View
> deal with that by requiring frequent requests (possibly cron-ed).

My understanding is that the KEY element of CouchDB Wiews is that they are
generated in advance, and incrementally, before you use them.

What you're proposing for the full text indexing sounds like quite the opposite
to me, though I may be totally wrong.

-- 
Noah Slater - The Apache Software Foundation <http://www.apache.org/>