You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by dave farkas <da...@interactivemediums.com> on 2009/06/25 18:54:10 UTC
design doc file size
Hi,
The company I work for is attempting to migrate two messaging systems
from mysql to couchdb. Couchdb will be used for reporting and searching
messages. Once we have the current data loaded, new messages will be
added once per day and existing messages will not be updated.
I currently have the smaller of the two loaded into couchdb and it has
8M documents for a total file size on disk of 19G. We have created 8
design docs (typically with two views in each). The total size of these
are 46G. The second systems is about three times the size of the smaller
one, so I'm expecting the couch database size to be about 60G and the
total design doc size to be 150G. Unfortunately, the server we were
planning to use won't have enough free disk space for our current
messages let alone new ones. Are there any ways to compact design
document size or best practices on how to reduce the file size for them?
Also, is there a way to cancel or stop a view from indexing once it starts?
Here is a typical example of our map/reduce functions (the generated
file size for this is 7.3G on disc). We're mainly calculating stats by
different criteria over time (messages per account per minute, day,
month, year, etc):
map.js
function(doc) {
if (doc['couchrest-type'] == 'ArchivedMessage' && doc.accounts &&
doc.messages) {
if (doc.accounts.length > 0) {
account_id = doc.accounts[0].account_id;
doc.messages.forEach(function(message) {
datetime = message.created_at_utc;
year = parseInt(datetime.substr(0, 4));
month = parseInt(datetime.substr(5, 2), 10);
day = parseInt(datetime.substr(8, 2), 10);
hour = parseInt(datetime.substr(11, 2), 10);
minute = parseInt(datetime.substr(14, 2), 10);
var message_type_count = new Object();
message_type_count[message.message_type] = 1;
message_type_count['total'] = 1;
emit([account_id, year, month, day, hour, minute],
message_type_count);
});
}
}
}
reduce.js
function(keys, values, rereduce) {
var mt_count = new Object();
for (i = 0; i < values.length; i++) {
var utc_count = values[i];
for (key in utc_count) {
var count = utc_count[key];
if (!mt_count[key]) {
mt_count[key] = count;
} else {
mt_count[key] += count;
}
}
}
return mt_count;
}
Thanks,
Dave