You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Michael Bleigh <mi...@intridea.com> on 2009/11/08 01:09:32 UTC

CouchDB Twitter Clone Architecture

So I've been thinking through the architecture of a Twitter-esque
system in Couch as a kind of thought exercise to get a better handle
on some of the more difficult corners of view generation. What would
be the most effective manner of creating Twitter-like status streams?

My initial feeling is to store the followings of a given user as an
array in the user's document and also have a view that compiles the
followers of a given user. When a user posts a status update, the
application would fetch the follower list from that view and simply
attach it to the status document. It is then a simply matter of a
composite key map of a given status document to all of the users
stored within to create a given user's home timeline.

Where this breaks down is your @aplusk scenario. Storing a 3.5 million
entry array with a document is obviously going to cripple performance
(at least I would think it would) as well as take up massive disk
space (I estimated around 7MB for a single JSON status with 1MM
followers).

So if this solution isn't scalable to millions of users, what's an
architecture that would be? How do you compose the user's tweet stream
such that it can be pulled in an efficient manner?

Just trying to start a discussion to help me better understand
document-oriented architecture, feel free to ignore me!

Michael Bleigh

Re: CouchDB Twitter Clone Architecture

Posted by Paul Davis <pa...@gmail.com>.
> For each user, have a replication filter function (coming soon in
> trunk) that only replicates the updates from people they follow, to
> their own db.

Not fair using make believe features! But not a bad idea for when it lands.

Paul Davis

Re: CouchDB Twitter Clone Architecture

Posted by Chris Anderson <jc...@apache.org>.
On Sat, Nov 7, 2009 at 8:07 PM, Paul Davis <pa...@gmail.com> wrote:
> On Sat, Nov 7, 2009 at 7:09 PM, Michael Bleigh <mi...@intridea.com> wrote:
>> So I've been thinking through the architecture of a Twitter-esque
>> system in Couch as a kind of thought exercise to get a better handle
>> on some of the more difficult corners of view generation. What would
>> be the most effective manner of creating Twitter-like status streams?
>>

I'd do it like this:

Have a global database where all new tweets are posted. We can call
this "the firehose" or "the pubic timeline".

For each user, have a replication filter function (coming soon in
trunk) that only replicates the updates from people they follow, to
their own db. Then each user can replicate their db offline or
whatever, and have full access to the archive of tweets they've
followed.

When you follow someone new, changing the filter function won't give
you their historical record of old tweets, but you can always use a
few to fetch that user's history and save it into the follower's db. I
actually prefer not getting a dump of all someone's tweets going back
in time, so maybe it's better to make replicating someone's updates to
my db an on-demand operation.

Also nice with this is that users's who don't visit the site (the
people who signup and never come back) won't cost anything, because
you don't have to run the filtered replication until a user hits the
site.

>> My initial feeling is to store the followings of a given user as an
>> array in the user's document and also have a view that compiles the
>> followers of a given user. When a user posts a status update, the
>> application would fetch the follower list from that view and simply
>> attach it to the status document. It is then a simply matter of a
>> composite key map of a given status document to all of the users
>> stored within to create a given user's home timeline.
>>
>> Where this breaks down is your @aplusk scenario. Storing a 3.5 million
>> entry array with a document is obviously going to cripple performance
>> (at least I would think it would) as well as take up massive disk
>> space (I estimated around 7MB for a single JSON status with 1MM
>> followers).
>>
>> So if this solution isn't scalable to millions of users, what's an
>> architecture that would be? How do you compose the user's tweet stream
>> such that it can be pulled in an efficient manner?
>>
>> Just trying to start a discussion to help me better understand
>> document-oriented architecture, feel free to ignore me!
>>
>> Michael Bleigh
>>
>
> Michael,
>
> Its hard to give too much of a description of what the best would be
> like, but off the cuff after more experience than the last time I made
> a comment on the "How does tweetcouch work" meme:
>
> Store each follower relation as a document. Offline when a new tweet
> comes in, look at a view that does "emit(person_being_followed,
> person_following)" and copy that tweet to the "person_following"'s
> stream.
>
> It may seem odd, but if you watch twitter streams closely you can see
> that they're actually a pretty good case of "eventually consistent".
> It's really noticeable when you're firing back and forth right quick
> between 2 or more people. Twitter is an interesting study because even
> if you send a tweet, and then 30 seconds later another tweet shows up
> as having arrived before you sent yours, humans don't really care. The
> async nature is not sensitive as long as we get a notice within
> reasonable time. A failing case is the example of getting a text
> message three days later. I just realized I'm still typing, so let me
> know if that answered anything.
>
> HTH,
> Paul Davis
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: CouchDB Twitter Clone Architecture

Posted by Paul Davis <pa...@gmail.com>.
On Sat, Nov 7, 2009 at 7:09 PM, Michael Bleigh <mi...@intridea.com> wrote:
> So I've been thinking through the architecture of a Twitter-esque
> system in Couch as a kind of thought exercise to get a better handle
> on some of the more difficult corners of view generation. What would
> be the most effective manner of creating Twitter-like status streams?
>
> My initial feeling is to store the followings of a given user as an
> array in the user's document and also have a view that compiles the
> followers of a given user. When a user posts a status update, the
> application would fetch the follower list from that view and simply
> attach it to the status document. It is then a simply matter of a
> composite key map of a given status document to all of the users
> stored within to create a given user's home timeline.
>
> Where this breaks down is your @aplusk scenario. Storing a 3.5 million
> entry array with a document is obviously going to cripple performance
> (at least I would think it would) as well as take up massive disk
> space (I estimated around 7MB for a single JSON status with 1MM
> followers).
>
> So if this solution isn't scalable to millions of users, what's an
> architecture that would be? How do you compose the user's tweet stream
> such that it can be pulled in an efficient manner?
>
> Just trying to start a discussion to help me better understand
> document-oriented architecture, feel free to ignore me!
>
> Michael Bleigh
>

Michael,

Its hard to give too much of a description of what the best would be
like, but off the cuff after more experience than the last time I made
a comment on the "How does tweetcouch work" meme:

Store each follower relation as a document. Offline when a new tweet
comes in, look at a view that does "emit(person_being_followed,
person_following)" and copy that tweet to the "person_following"'s
stream.

It may seem odd, but if you watch twitter streams closely you can see
that they're actually a pretty good case of "eventually consistent".
It's really noticeable when you're firing back and forth right quick
between 2 or more people. Twitter is an interesting study because even
if you send a tweet, and then 30 seconds later another tweet shows up
as having arrived before you sent yours, humans don't really care. The
async nature is not sensitive as long as we get a notice within
reasonable time. A failing case is the example of getting a text
message three days later. I just realized I'm still typing, so let me
know if that answered anything.

HTH,
Paul Davis