You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Anderson <jc...@apache.org> on 2011/04/25 05:23:02 UTC

Re: View generation checkpointing on every update

I was just googling and came on this thread.

I agree a less aggressive checkpoint pattern could be beneficial. I
think there has been some discussion about this on the dev list. Now
I've gotta dig it up.

Was thinking about looking at the way couch_view_updater interacts
with couch_work_queue.

Chris

On Mon, Aug 23, 2010 at 12:04 AM, Sebastian Cohnen
<se...@googlemail.com> wrote:
> Hey Jamie,
>
> first, I don't know anything about view checkpointing and how/if it could be customized in order to make couch commit less often, sorry :)
>
> (more replies inline)
>
>
> On 23.08.2010, at 08:38, Jamie Talbot wrote:
>
>> Tuyen Tran <it...@...> writes:
>>>
>>> We have a view that is checkpointing on every update and taking a long time
>>> to generate.
>>> <snip..>
>>> Has anyone seen similar performance? Are my documents too big with too many
>> fields?
>>>
>>> Thanks,
>>> -T
>>
>> I have almost the same situation as you, using CouchDB 1.0.  Only 3000
>> documents in this sample database (of an overall document set of 450000).  Each
>> document is about 200KB, and contains an array of JSON objects, that each have
>> 3 small properties.
>>
>> My view emits a large key of 6 parts (an array of timestamp components) and a
>> value array with 2 integers, In and Out.  Without a reduce step it takes 5m20s
>> to generate.  With a reduce step that does a sum of Ins and Outs, it takes more
>> than 30 minutes.  Each document takes about 7 seconds to process.  It
>> checkpoints after every document.
>
> Raw speed is hard to compare, though 7s per document only for emitting some fields of each document and summing up some values sound quite slow. Since you are emitting non scalar values you cannot use the build-in reduce functions, which are *very* fast (implemented in erlang, running inside couchdb, no serialization overhead). But using a custom written erlang view could still be a very good option.
>
>> When looking at the size of the view, it comes out at about 900MB of data,
>> from a 30MB database.  After compacting, this drops to 90MB, or a factor of 10.
>> I found 0.10 significantly faster, though I don't have hard numbers, and didn't
>> try 0.11.
>
> What are you emitting as keys for your view? This kind of discrepancy in size between compacted and not-compated view could be a sign, that you are emitting very large or complex keys (at least this is my experience). Maybe you can have a look in this direction to optimize.
>
>> On these numbers, Couch is unfortunately going to be unusable.  For the full
>> document set, it is likely to take 44 days to build the view, and will take
>> roughly 1.5TB, which will compact down to 150GB.  Once it's up and running, it
>> will probably be fine; we only add a document every 2 minutes, so a 7 second
>> build time and calling stale=true on the client will suffice.  However the
>> risk on the view file is too great to bear.  If it were to be corrupted (Couch
>> does an excellent job at avoiding this, but you need to plan for the worst), it
>> would take a month and half to rebuild.
>
> View corruption is very unlikely, but you can copy around view files like databases, so you could easily copy the views from your backup/slave/... system to the server that got corrupted. So that shouldn't be a real problem.
>
>> I have seen a number of posts where people have starting considering a
>> different view building algorithm that is oriented to performance.  I would
>> personally love to see a "risky=true" build option for the views, which
>> focussed more on performance and less on stability, on the understanding that
>> if we crashed while generating it, we'd have to start again.  For the initial
>> load, and rebuilds, that would be a price worth paying.  We're never going to
>> have less data!
>>
>> I'm also keen to hear peoples' experiences with this.
>>
>> Kind Regards,
>>
>> Jamie.
>>
>>
>>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couchbase.com