You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Benjamin Smith <li...@benjamindsmith.com> on 2009/06/18 04:46:29 UTC

High volume couchDB?

I recently ran across this project while doing research into Erlang for High 
Availability, from what I can see, this project may be exactly what I've been 
looking for. 

We've been using a clustered filesystem API for our large php application to 
keep files and related, semi-structured data. It was developed in-house. We see 
perhaps 250,000 file operations per day, with 3 nodes to store data. Think: 
Server-level RAID 1. Total data size is ~ 1 TB, with about 50% growth per 
year. 

What I'm looking for: 
1) Ability to store misc files. (Mixed: PDFs, JPGs, iso images, text files, etc) 

2) Ability to store related metadata close by (time stamp, ownership data,
application-specific data, etc) We do this now by having a "sister" file with 
the data in PHP format serialized with a ".mdt" extension. 

3) Redundancy: zero data loss in the event of a server failure. We achieve 
this now by having our own, developed-in-house file server daemon running under 
xinet.d. Conceptually, it's similar to WebDAV, but lighter weight. 

4) Failover: ability to keep working even with partial cluster failure. 

5) Healing: ability to get "back together" when downed servers are restored. 

6) Performance that degrades gracefully: What happens when the screws get put 
to CouchDB? What kinds of loads can it sustain given mid-range hardware? 

7) Backups off-site: Disaster Recovery plans, currently we're using rsync run 
nightly. 

8) Reliability: It should "just work" without needing regular babysitting. 

Am I right in reading that CouchDB accomplishes all/most/many of these goals? 
If not all of them, which would need watching? 

Thanks! 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.