You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "Garren Smith (JIRA)" <ji...@apache.org> on 2017/04/24 07:17:04 UTC

[jira] [Commented] (COUCHDB-3391) The _replicator Database Is Not Scalable or My Design Needs Tweaking

    [ https://issues.apache.org/jira/browse/COUCHDB-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980795#comment-15980795 ] 

Garren Smith commented on COUCHDB-3391:
---------------------------------------

Hi Geoffrey,

Quizster looks really cool. I can't comment on your db per user design, but in terms of replication, we have a new Replication Scheduler https://github.com/apache/couchdb/pull/470 that will land soon. I think it should help you quite a lot. If you have some time, could you build that branch and test it out. Its always nice to get some more eyeballs on new code.

> The _replicator Database Is Not Scalable or My Design Needs Tweaking
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-3391
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3391
>             Project: CouchDB
>          Issue Type: Question
>          Components: Replication
>            Reporter: Geoffrey Cox
>
> I think it is important that I elaborate on where I am coming from so that you can understand my use case, please bear with me.
> Background: I’m a big fan of CouchDB, its offline capabilities and the ecosystem surrounding it, specifically PouchDB. So much so, that I built the Quizster app (https://quizster.co) using CouchDB and PouchDB. Both are amazingly powerful, but they have some rough edges so I’ve had to create a significant amount of software on top of CouchDB/PouchDB and am in the process of open sourcing it. Before I do, I’m looking to migrate this technology from using CouchDB 1 to 2 and this migration is going to take a decent amount of work. I just want to double check that I’m not reinventing the wheel and make sure that there isn’t a better design to what I will elaborate on below, especially since CouchDB 2 appears to have some awesome new features.
> Consider the following use case for an app that allows students to submit quiz answers digitally. Each student should be able to submit her/his quiz answers and the teacher should be able to view all the answers. This design needs to work with PouchDB as PouchDB speaks directly to the DB and this saves us a lot of time as otherwise an elaborate set of APIs would need to be written. (This solution is similar to what was implemented for Quizster, but it is greatly simplified so that we can focus on just the root of the design).
> My chosen design consists of a database per student and a database per teacher, i.e. a database per user. Only the owner of the database can edit her/his database and this is enforced via CouchDB roles. When a student submits an answer, it is synced with her/his database via PouchDB. The answers are then replicated to the teacher’s database. This in turn allows the students to quickly load their answers in the app and the teachers to load all the answers for all their students. Of course, there are views in the teacher databases that segment the answers by class, quiz, etc… so that the teacher doesn’t have to load the answers for all their students at once. If we didn’t have the teacher database then a teacher would need access to all the students’ databases and would have to sync with all of the their student’s databases.
> First question: is this database-per-user design for both students and teachers the best solution or is there a better solution?
> At first glance, the _replicator database appears to be the the obvious way to replicate the data from the student databases to a single teacher database. The big gotcha is that when you use continuous replication, it consumes a file handle and a database connection which means that you can very quickly starve a database of its resources. For example, if we have say 10,000 students in our database then we need 10,000 concurrent file handles and database connections just for the replications. This is pretty crazy considering that it is unlikely that even say 100 of these 10,000 students would be using the app simultaneously.
> Instead, I developed a service that listens to the _db_updates feed and then only replicates a database when there is a change to that specific database. With this method, we only worry about consuming resources when there are changes and as a result we end up with plenty of free file handles and database connections.
> I’ve briefly experimented with CouchDB 2 and it appears that the _replicator database is just as greedy with resources as it was in CouchDB 1.
> Second question: is there a better way of replicating this data that doesn’t consume so many resources?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)