You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Tim Hankins <ti...@gmail.com> on 2012/11/02 13:49:07 UTC

Tips for optimizing Filtered Replication?

I have a single database (300MB & 42,924 documents) consisting of about 20
different kinds of documents from about 200 users. The documents range in
size from a few bytes to many KiloBytes (150KB or so).

When the server is unloaded, the following replication filter function
takes about 2.5 minutes to complete. When the server is loaded, it takes
>10 minutes.

Can anyone comment on whether these times are expected, and if not, suggest
how I might optimize things in order to get better performance?

function(doc, req) {
    acceptedDate = true;
    if(doc.date) {
        var docDate = new Date();
        var dateKey = doc.date;
        docDate.setFullYear(dateKey[0], dateKey[1], dateKey[2]);

        var reqYear = req.query.year;
        var reqMonth = req.query.month;
        var reqDay = req.query.day;
        var reqDate = new Date();
        reqDate.setFullYear(reqYear, reqMonth, reqDay);

        acceptedDate = docDate.getTime() >= reqDate.getTime();
    }

    return doc.user_id && doc.user_id == req.query.userid &&
doc._id.indexOf("_design") != 0 && acceptedDate;
}

Re: Tips for optimizing Filtered Replication?

Posted by Jens Alfke <je...@couchbase.com>.
On Nov 2, 2012, at 5:49 AM, Tim Hankins <ti...@gmail.com>> wrote:

Can anyone comment on whether these times are expected, and if not, suggest
how I might optimize things in order to get better performance?

I’ve found that date/time parsing is slow on all platforms, and can be a performance bottleneck. It’s best if you can avoid it.

The usual workaround in CouchDB is to store dates in a format that sorts properly with plain string collation, i.e. some type of “YYYY-MM-DD HH:MM:SS” style. Then there’s no need to parse dates to compare them, you just compare strings.

An alternative is to store dates as numeric timestamps (e.g. seconds since an epoch). This can be even faster, and lets you quickly do computations like the time between two dates, but a lot of people dislike it because it’s not human-readable.

—Jens