You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Marc Grabanski <m...@marcgrabanski.com> on 2011/05/09 23:23:52 UTC

CouchDB Architecture for Modern Web Applications

I am trying to resolve common threads that keep coming up when CouchDB is
discussed in the wild.

One major thing I see debated is the per-database ACL in CouchDB. Currently,
we have to assume that most applications will need to give each user their
own private database, unless they access couch through an application-tier
like rails -- this doesn't seem to leverage CouchDB's core strength of being
HTTP/REST-based out of the box.

So given each user has their own private database, the general use-case in
question is:
If a user has some info in their personal database they want to make public,
how do you make it easy to query across all of the public user data that is
left fragmented inside thousands of databases?

A possible solution seems to be replicating public information out to
another public database via some type of replication, but what if they
switch that data back to private, how would that public database be best
synced and kept up to date? I've heard filtered replication (
http://blog.couchbase.com/whats-new-in-apache-couchdb-0-11-part-three-new)
being thrown around, but maybe someone can take a stab at explaining
at a
high level how they would tackle this general use-case?

Typically in web applications you commonly need to:
- generate / search through a list of users
- aggregate and search all the public user information
- generate an activity feed based a person's friends or followers

How would these general use-cases be best architected in CouchDB? I am
seeing a lot of fallout from people who try to use CouchDB for those things
and eventually give up and go back to a relational DB because it is too
large of a departure for them to come up with a good solution.

Would you recommend these people go use a relational database for these
types of web applications, or are there general guidelines that would keep
these types of applications perfectly happy inside CouchDB? I think a
high-level explanation on this topic would be good since people seem to be
very confused when to use CouchDB, and more importantly when not to use
CouchDB for their web-based applications.

Sincerely,
--
Marc

Re: CouchDB Architecture for Modern Web Applications

Posted by Hendrik Jan van Meerveld <ha...@gmail.com>.
Having a "_replicator" database sounds like you can use
"validate_doc_update" to disallow certain types of replication (like what
Nitin says).
For that to work the "_replicate" URL needs to be disabled when a
"_replicator" database is around.

Is it going to be that way?

Kind regards and looking forward for the next release,
Hendrik Jan



On 10 May 2011 06:51, Nitin Borwankar <nb...@fastmail.fm> wrote:

> I have reservations about calling replication filters a "security model"
> since it is decided by the recipient entirely. It is more of a spam filter.
> It doesn't prevent docs from *leaving* my  host but relies on the recipient
> honor policy. I'd like to have an outgoing repl filter before I am
> comfortable calling this security of any kind. Just sayin'.
>
> Sent from my mobile Internet device
>
>
> On May 9, 2011, at 7:48 PM, Chris Anderson <jc...@apache.org> wrote:
>
> > I think these are good questions.
> >
> > It is important, if you are gonna build a replicating app, that you
> > get used to thinking of replication filters as the security model.
> >
> > Validation functions and replication filters.
> >
> > Per document ACLs get hard to maintain (each query engine had to
> > respect them, and if the user really cares, they could just grep the
> > .couch file.)
> >
> > So to really be secure, you have to think in terms of replication.
> >
> > Now that we have the _replicator DB coming in 1.1, it will make it
> > easy to make CouchApps that manage replication. This way we can make a
> > super simple tool to keep Couch Apps running.
> >
> >
> > I fully agree we need more replicator tooling.
> >
> > Chris
> >
> >
> > On Mon, May 9, 2011 at 2:23 PM, Marc Grabanski <m...@marcgrabanski.com>
> wrote:
> >> I am trying to resolve common threads that keep coming up when CouchDB
> is
> >> discussed in the wild.
> >>
> >> One major thing I see debated is the per-database ACL in CouchDB.
> Currently,
> >> we have to assume that most applications will need to give each user
> their
> >> own private database, unless they access couch through an
> application-tier
> >> like rails -- this doesn't seem to leverage CouchDB's core strength of
> being
> >> HTTP/REST-based out of the box.
> >>
> >> So given each user has their own private database, the general use-case
> in
> >> question is:
> >> If a user has some info in their personal database they want to make
> public,
> >> how do you make it easy to query across all of the public user data that
> is
> >> left fragmented inside thousands of databases?
> >>
> >> A possible solution seems to be replicating public information out to
> >> another public database via some type of replication, but what if they
> >> switch that data back to private, how would that public database be best
> >> synced and kept up to date? I've heard filtered replication (
> >>
> http://blog.couchbase.com/whats-new-in-apache-couchdb-0-11-part-three-new)
> >> being thrown around, but maybe someone can take a stab at explaining
> >> at a
> >> high level how they would tackle this general use-case?
> >>
> >> Typically in web applications you commonly need to:
> >> - generate / search through a list of users
> >> - aggregate and search all the public user information
> >> - generate an activity feed based a person's friends or followers
> >>
> >> How would these general use-cases be best architected in CouchDB? I am
> >> seeing a lot of fallout from people who try to use CouchDB for those
> things
> >> and eventually give up and go back to a relational DB because it is too
> >> large of a departure for them to come up with a good solution.
> >>
> >> Would you recommend these people go use a relational database for these
> >> types of web applications, or are there general guidelines that would
> keep
> >> these types of applications perfectly happy inside CouchDB? I think a
> >> high-level explanation on this topic would be good since people seem to
> be
> >> very confused when to use CouchDB, and more importantly when not to use
> >> CouchDB for their web-based applications.
> >>
> >> Sincerely,
> >> --
> >> Marc
> >>
> >
> >
> >
> > --
> > Chris Anderson
> > http://jchrisa.net
> > http://couchbase.com
>

Re: CouchDB Architecture for Modern Web Applications

Posted by Nitin Borwankar <nb...@fastmail.fm>.
I have reservations about calling replication filters a "security model" since it is decided by the recipient entirely. It is more of a spam filter. It doesn't prevent docs from *leaving* my  host but relies on the recipient honor policy. I'd like to have an outgoing repl filter before I am comfortable calling this security of any kind. Just sayin'. 

Sent from my mobile Internet device


On May 9, 2011, at 7:48 PM, Chris Anderson <jc...@apache.org> wrote:

> I think these are good questions.
> 
> It is important, if you are gonna build a replicating app, that you
> get used to thinking of replication filters as the security model.
> 
> Validation functions and replication filters.
> 
> Per document ACLs get hard to maintain (each query engine had to
> respect them, and if the user really cares, they could just grep the
> .couch file.)
> 
> So to really be secure, you have to think in terms of replication.
> 
> Now that we have the _replicator DB coming in 1.1, it will make it
> easy to make CouchApps that manage replication. This way we can make a
> super simple tool to keep Couch Apps running.
> 
> 
> I fully agree we need more replicator tooling.
> 
> Chris
> 
> 
> On Mon, May 9, 2011 at 2:23 PM, Marc Grabanski <m...@marcgrabanski.com> wrote:
>> I am trying to resolve common threads that keep coming up when CouchDB is
>> discussed in the wild.
>> 
>> One major thing I see debated is the per-database ACL in CouchDB. Currently,
>> we have to assume that most applications will need to give each user their
>> own private database, unless they access couch through an application-tier
>> like rails -- this doesn't seem to leverage CouchDB's core strength of being
>> HTTP/REST-based out of the box.
>> 
>> So given each user has their own private database, the general use-case in
>> question is:
>> If a user has some info in their personal database they want to make public,
>> how do you make it easy to query across all of the public user data that is
>> left fragmented inside thousands of databases?
>> 
>> A possible solution seems to be replicating public information out to
>> another public database via some type of replication, but what if they
>> switch that data back to private, how would that public database be best
>> synced and kept up to date? I've heard filtered replication (
>> http://blog.couchbase.com/whats-new-in-apache-couchdb-0-11-part-three-new)
>> being thrown around, but maybe someone can take a stab at explaining
>> at a
>> high level how they would tackle this general use-case?
>> 
>> Typically in web applications you commonly need to:
>> - generate / search through a list of users
>> - aggregate and search all the public user information
>> - generate an activity feed based a person's friends or followers
>> 
>> How would these general use-cases be best architected in CouchDB? I am
>> seeing a lot of fallout from people who try to use CouchDB for those things
>> and eventually give up and go back to a relational DB because it is too
>> large of a departure for them to come up with a good solution.
>> 
>> Would you recommend these people go use a relational database for these
>> types of web applications, or are there general guidelines that would keep
>> these types of applications perfectly happy inside CouchDB? I think a
>> high-level explanation on this topic would be good since people seem to be
>> very confused when to use CouchDB, and more importantly when not to use
>> CouchDB for their web-based applications.
>> 
>> Sincerely,
>> --
>> Marc
>> 
> 
> 
> 
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couchbase.com

Re: CouchDB Architecture for Modern Web Applications

Posted by James Marca <jm...@translab.its.uci.edu>.
On Mon, May 09, 2011 at 07:48:46PM -0700, Chris Anderson wrote:
> I think these are good questions.
> 
> It is important, if you are gonna build a replicating app, that you
> get used to thinking of replication filters as the security model.
> 

I for one would really appreciate more documentation and examples on
building real world replication filters.

I'm saying that not having checked the Wiki or the couch book text
lately, so maybe it is there and I didn't understand what I was
reading the first time through.  If so I apologize and would
humbly accept pointers and scolding.

I remember reading a case study of a company that was using filtered
replication to expose some but not all data to their clients on a
client by client basis (something about bidding on jobs with spare lab
capacity where some jobs were confidential).  That was really cool,
and I would love to know how to do something like that, but I can't
figure it out on my own.




> Validation functions and replication filters.
> 
> Per document ACLs get hard to maintain (each query engine had to
> respect them, and if the user really cares, they could just grep the
> .couch file.)
> 
> So to really be secure, you have to think in terms of replication.
> 
> Now that we have the _replicator DB coming in 1.1, it will make it
> easy to make CouchApps that manage replication. This way we can make a
> super simple tool to keep Couch Apps running.
> 
> 
> I fully agree we need more replicator tooling.
> 
> Chris
> 
> 
> On Mon, May 9, 2011 at 2:23 PM, Marc Grabanski <m...@marcgrabanski.com> wrote:
> > I am trying to resolve common threads that keep coming up when CouchDB is
> > discussed in the wild.
> >
> > One major thing I see debated is the per-database ACL in CouchDB. Currently,
> > we have to assume that most applications will need to give each user their
> > own private database, unless they access couch through an application-tier
> > like rails -- this doesn't seem to leverage CouchDB's core strength of being
> > HTTP/REST-based out of the box.
> >
> > So given each user has their own private database, the general use-case in
> > question is:
> > If a user has some info in their personal database they want to make public,
> > how do you make it easy to query across all of the public user data that is
> > left fragmented inside thousands of databases?
> >
> > A possible solution seems to be replicating public information out to
> > another public database via some type of replication, but what if they
> > switch that data back to private, how would that public database be best
> > synced and kept up to date? I've heard filtered replication (
> > http://blog.couchbase.com/whats-new-in-apache-couchdb-0-11-part-three-new)
> > being thrown around, but maybe someone can take a stab at explaining
> > at a
> > high level how they would tackle this general use-case?
> >
> > Typically in web applications you commonly need to:
> > - generate / search through a list of users
> > - aggregate and search all the public user information
> > - generate an activity feed based a person's friends or followers
> >
> > How would these general use-cases be best architected in CouchDB? I am
> > seeing a lot of fallout from people who try to use CouchDB for those things
> > and eventually give up and go back to a relational DB because it is too
> > large of a departure for them to come up with a good solution.
> >
> > Would you recommend these people go use a relational database for these
> > types of web applications, or are there general guidelines that would keep
> > these types of applications perfectly happy inside CouchDB? I think a
> > high-level explanation on this topic would be good since people seem to be
> > very confused when to use CouchDB, and more importantly when not to use
> > CouchDB for their web-based applications.
> >
> > Sincerely,
> > --
> > Marc
> >
> 
> 
> 
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couchbase.com

-- 
James E. Marca
Researcher
Institute of Transportation Studies
University of California
Irvine, CA 92697-3600


Re: CouchDB Architecture for Modern Web Applications

Posted by Chris Anderson <jc...@apache.org>.
I think these are good questions.

It is important, if you are gonna build a replicating app, that you
get used to thinking of replication filters as the security model.

Validation functions and replication filters.

Per document ACLs get hard to maintain (each query engine had to
respect them, and if the user really cares, they could just grep the
.couch file.)

So to really be secure, you have to think in terms of replication.

Now that we have the _replicator DB coming in 1.1, it will make it
easy to make CouchApps that manage replication. This way we can make a
super simple tool to keep Couch Apps running.


I fully agree we need more replicator tooling.

Chris


On Mon, May 9, 2011 at 2:23 PM, Marc Grabanski <m...@marcgrabanski.com> wrote:
> I am trying to resolve common threads that keep coming up when CouchDB is
> discussed in the wild.
>
> One major thing I see debated is the per-database ACL in CouchDB. Currently,
> we have to assume that most applications will need to give each user their
> own private database, unless they access couch through an application-tier
> like rails -- this doesn't seem to leverage CouchDB's core strength of being
> HTTP/REST-based out of the box.
>
> So given each user has their own private database, the general use-case in
> question is:
> If a user has some info in their personal database they want to make public,
> how do you make it easy to query across all of the public user data that is
> left fragmented inside thousands of databases?
>
> A possible solution seems to be replicating public information out to
> another public database via some type of replication, but what if they
> switch that data back to private, how would that public database be best
> synced and kept up to date? I've heard filtered replication (
> http://blog.couchbase.com/whats-new-in-apache-couchdb-0-11-part-three-new)
> being thrown around, but maybe someone can take a stab at explaining
> at a
> high level how they would tackle this general use-case?
>
> Typically in web applications you commonly need to:
> - generate / search through a list of users
> - aggregate and search all the public user information
> - generate an activity feed based a person's friends or followers
>
> How would these general use-cases be best architected in CouchDB? I am
> seeing a lot of fallout from people who try to use CouchDB for those things
> and eventually give up and go back to a relational DB because it is too
> large of a departure for them to come up with a good solution.
>
> Would you recommend these people go use a relational database for these
> types of web applications, or are there general guidelines that would keep
> these types of applications perfectly happy inside CouchDB? I think a
> high-level explanation on this topic would be good since people seem to be
> very confused when to use CouchDB, and more importantly when not to use
> CouchDB for their web-based applications.
>
> Sincerely,
> --
> Marc
>



-- 
Chris Anderson
http://jchrisa.net
http://couchbase.com