You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by "maku@makuchaku.in" <ma...@makuchaku.in> on 2011/09/12 14:34:08 UTC

Re: Using couchdb for analytics

Hi everyone,

Considering that I've bypassed the problem of cross-domain communication
using proxy/iframes...

I want to store counters in a document, incremented on each page view.
CouchDB will create a complete revision of this document for just 1 counter
update.

Wouldn't this consume too much space?
Considering that I have 1M hits in a day, I might be looking at 1M revisions
to the document in a day.

Any thoughts on this...

Thanks!
--
Mayank
http://adomado.com



On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
matheis.stefan@googlemail.com> wrote:

> What about proxying couch.foo.com through foo.com/couch? maybe not the
> complete service, at least one "special" url which triggers the write
> on couch?
>
> Regards
> Stefan
>
> On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <ma...@makuchaku.in>
> wrote:
> > Hi everyone,
> >
> > I think I had a fundamental flaw in my assumption - realized this
> > yesterday...
> > If the couchdb analytics server is hosted on couch.foo.com (foo.combeing
> > the main site) - I would never be able to make write requests via client
> > side javascript as cross-domain policy would be a barrier.
> >
> > I thought about this - and came across a potential solution...
> > What if, I host an html page as an attachment in couchdb & whenever I
> have
> > to make a write call, include this html in an iframe & pass on the
> > parameters in the query string of iframe URL.
> > The iframe will have javascript which understands the incoming query
> string
> > params & takes action (creates POST/PUT to couchdb).
> >
> > There would be no cross-domain barriers as the html page is being served
> > right out of couchdb itself - where ever its hosted (couch.foo.com)
> >
> > This might not be a performance hit - as etags will help in client-side
> > caching of the html page.
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <ma...@makuchaku.in>
> wrote:
> >
> >> Its 700 req/min :)
> >> --
> >> Mayank
> >> http://adomado.com
> >>
> >>
> >>
> >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <ja...@apache.org> wrote:
> >>
> >>>
> >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> >>>
> >>> > Forgot to mention...
> >>> > All of these 700 req/sec are write requests (data logging) & no data
> >>> crunching.
> >>> > Our current inhouse analytics solution (built on Rails, Mysql) gets
> >>> >>
> >>> >> about 700 req/min on an average day...
> >>>
> >>> min or sec? :)
> >>>
> >>> Cheers
> >>> Jan
> >>> --
> >>>
> >>>
> >>> >>
> >>> >> --
> >>> >> Mayank
> >>> >> http://adomado.com
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rg...@rgabostyle.com>
> >>> wrote:
> >>> >>> Take a look at update handlers [1]. It is a more lightweight way to
> >>> create / update your visitor documents, without having to GET the
> document,
> >>> modify and PUT back the whole thing. It also simplifies dealing with
> >>> document revisions as my understanding is that you should not be
> running
> >>> into conflicts.
> >>> >>>
> >>> >>> I wouldn't expect any problem handling the concurrent traffic and
> >>> tracking the users, but the view indexer will take some time with the
> >>> processing itself. You can always replicate the database (or parts of
> it
> >>> using a replication filter) to another CouchDB instance and perform the
> >>> crunching there.
> >>> >>>
> >>> >>> It's fairly vague how much updates / writes your 2k-5k traffic
> would
> >>> cause. How many requests/sec on your site? How many property updates
> that
> >>> causes?
> >>> >>>
> >>> >>> Btw, CouchDB users, is there any way to perform bulk updates using
> >>> update handlers, similar to _bulk_docs?
> >>> >>>
> >>> >>> Gabor
> >>> >>>
> >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> >>> >>>
> >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
> >>> >>>
> >>> >>>> Hi everyone,
> >>> >>>>
> >>> >>>> I came across couchdb a couple of weeks back & got really excited
> by
> >>> >>>> the fundamental change it brings by simply taking the app-server
> out
> >>> >>>> of the picture.
> >>> >>>> Must say, kudos to the dev team!
> >>> >>>>
> >>> >>>> I am planning to write a quick analytics solution for my website -
> >>> >>>> something on the lines of Google analytics - which will measure
> >>> >>>> certain properties of the visitors hitting our site.
> >>> >>>>
> >>> >>>> Since this is my first attempt at a JSON style document store, I
> >>> >>>> thought I'll share the architecture & see if I can make it better
> (or
> >>> >>>> correct my mistakes before I do them) :-)
> >>> >>>>
> >>> >>>> - For each unique visitor, create a document with his session_id
> as
> >>> the doc.id
> >>> >>>> - For each property i need to track about this visitor, I create a
> >>> >>>> key-value pair in the doc created for this visitor
> >>> >>>> - If visitor is a returning user, use the session_id to re-open
> his
> >>> >>>> doc & keep on modifying the properties
> >>> >>>> - At end of each calculation time period (say 1 hour or 24 hours),
> I
> >>> >>>> run a cron job which fires the map-reduce jobs by requesting the
> >>> views
> >>> >>>> over curl/http.
> >>> >>>>
> >>> >>>> A couple of questions based on above architecture...
> >>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
> >>> >>>> - Would a couchdb instance running on a good machine (say High CPU
> >>> >>>> EC2, medium instance) work well with simultaneous writes
> happening...
> >>> >>>> (visitors browsing, properties changing or getting created)
> >>> >>>> - With a couple of million documents, would I be able to process
> my
> >>> >>>> views without causing any significant impact to write performance?
> >>> >>>>
> >>> >>>> I think my questions might be biased by the fact that I come from
> a
> >>> >>>> MySQL/Rails background... :-)
> >>> >>>>
> >>> >>>> Let me know how you guys think about this.
> >>> >>>>
> >>> >>>> Thanks in advance,
> >>> >>>> --
> >>> >>>> Mayank
> >>> >>>> http://adomado.com
> >>> >>>
> >>> >>>
> >>> >>
> >>>
> >>>
> >>
> >
>

Re: Using couchdb for analytics

Posted by "maku@makuchaku.in" <ma...@makuchaku.in>.
Thanks Scott,
That surely will help me making the informed decision.
--
Mayank
http://adomado.com



On Mon, Sep 12, 2011 at 10:57 PM, Scott Feinberg
<fe...@gmail.com>wrote:

> Unless you can ensure that only one process will be editing the document at
> a time (to ensure that you never end up holding an old revision), your
> going
> to have issues. I've never tried it, but I'd be under the assumption
> conflict resolution wouldn't work at all.
>
> Revision history is a large part of what makes CouchDB tick.  It would also
> limit you from ever having a cluster, without revision history the cluster
> would never be able to negotiate.
>
> Your not going to end up with millions of revisions, as it says:  *
> _revs_limit* defines a upper bound of document revisions which CouchDB
> keeps
> track of, even afterCompaction <http://wiki.apache.org/couchdb/Compaction
> >.
> The default is set to 1000 on CouchDB 0.11.
>
> Not sure what it is set to now, but I assume it's probably the same.  Plus
> even if it was 1 million revisions, you're talking about a million key
> value
> pairs-nothing significant. And compaction would remove your excess
> revisions.
>
> Here's what you need: http://blog.couchbase.com/atomic-increments-couchdb
>
> --Scott
>
>
> On Mon, Sep 12, 2011 at 1:10 PM, maku@makuchaku.in <maku@makuchaku.in
> >wrote:
>
> > Question
> >
> > If a database is configured with
> > _revs_limit<
> >
> http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options
> > >=1,
> > will the following features still work?
> > - Conflict resolution
> > - Changes feed
> >
> > Hypothetically, to maintain an incrementing counter, we can have such a
> > key/value pair in the document whose database is configured with
> > _revs_limit
> > = 1
> >
> > Thoughts?
> >
> > Thanks!
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Mon, Sep 12, 2011 at 6:40 PM, maku@makuchaku.in <maku@makuchaku.in
> > >wrote:
> >
> > > Thanks for the tip Scott.
> > >
> > > However, I have a feeling that compacting the database is not the
> correct
> > > answer to this problem.
> > >
> > > I am going to test - limiting revs on a document.
> > >
> > > Lets see how that fares up...
> > > But I have a hunch that if I do that, the conflict resolution strategy
> > will
> > > not work.
> > > --
> > > Mayank
> > > http://adomado.com
> > >
> > >
> > >
> > > On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <
> > feinberg.scott@gmail.com>wrote:
> > >
> > >> It wouldn't consume too much space as long as your regularly
> compacting
> > >> your
> > >> database.
> > >>
> > >> As much as I love CouchDB and this is a CouchDB users mailing list, I
> > >> tried
> > >> to do something similar and I found MongoDB was better suited due to
> > it's
> > >> support for partial updates.
> > >>
> > >> I based the project of some of the work from
> > http://hummingbirdstats.com/
> > >> .
> > >>
> > >> --Scott
> > >>
> > >> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
> > >> >wrote:
> > >>
> > >> > Hi everyone,
> > >> >
> > >> > Considering that I've bypassed the problem of cross-domain
> > communication
> > >> > using proxy/iframes...
> > >> >
> > >> > I want to store counters in a document, incremented on each page
> view.
> > >> > CouchDB will create a complete revision of this document for just 1
> > >> counter
> > >> > update.
> > >> >
> > >> > Wouldn't this consume too much space?
> > >> > Considering that I have 1M hits in a day, I might be looking at 1M
> > >> > revisions
> > >> > to the document in a day.
> > >> >
> > >> > Any thoughts on this...
> > >> >
> > >> > Thanks!
> > >> > --
> > >> > Mayank
> > >> > http://adomado.com
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> > >> > matheis.stefan@googlemail.com> wrote:
> > >> >
> > >> > > What about proxying couch.foo.com through foo.com/couch? maybe
> not
> > >> the
> > >> > > complete service, at least one "special" url which triggers the
> > write
> > >> > > on couch?
> > >> > >
> > >> > > Regards
> > >> > > Stefan
> > >> > >
> > >> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <
> > maku@makuchaku.in>
> > >> > > wrote:
> > >> > > > Hi everyone,
> > >> > > >
> > >> > > > I think I had a fundamental flaw in my assumption - realized
> this
> > >> > > > yesterday...
> > >> > > > If the couchdb analytics server is hosted on couch.foo.com
> > >> (foo.combeing
> > >> > > > the main site) - I would never be able to make write requests
> via
> > >> > client
> > >> > > > side javascript as cross-domain policy would be a barrier.
> > >> > > >
> > >> > > > I thought about this - and came across a potential solution...
> > >> > > > What if, I host an html page as an attachment in couchdb &
> > whenever
> > >> I
> > >> > > have
> > >> > > > to make a write call, include this html in an iframe & pass on
> the
> > >> > > > parameters in the query string of iframe URL.
> > >> > > > The iframe will have javascript which understands the incoming
> > query
> > >> > > string
> > >> > > > params & takes action (creates POST/PUT to couchdb).
> > >> > > >
> > >> > > > There would be no cross-domain barriers as the html page is
> being
> > >> > served
> > >> > > > right out of couchdb itself - where ever its hosted (
> > couch.foo.com)
> > >> > > >
> > >> > > > This might not be a performance hit - as etags will help in
> > >> client-side
> > >> > > > caching of the html page.
> > >> > > > --
> > >> > > > Mayank
> > >> > > > http://adomado.com
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <
> > >> maku@makuchaku.in>
> > >> > > wrote:
> > >> > > >
> > >> > > >> Its 700 req/min :)
> > >> > > >> --
> > >> > > >> Mayank
> > >> > > >> http://adomado.com
> > >> > > >>
> > >> > > >>
> > >> > > >>
> > >> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <ja...@apache.org>
> > >> wrote:
> > >> > > >>
> > >> > > >>>
> > >> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> > >> > > >>>
> > >> > > >>> > Forgot to mention...
> > >> > > >>> > All of these 700 req/sec are write requests (data logging) &
> > no
> > >> > data
> > >> > > >>> crunching.
> > >> > > >>> > Our current inhouse analytics solution (built on Rails,
> Mysql)
> > >> gets
> > >> > > >>> >>
> > >> > > >>> >> about 700 req/min on an average day...
> > >> > > >>>
> > >> > > >>> min or sec? :)
> > >> > > >>>
> > >> > > >>> Cheers
> > >> > > >>> Jan
> > >> > > >>> --
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> >>
> > >> > > >>> >> --
> > >> > > >>> >> Mayank
> > >> > > >>> >> http://adomado.com
> > >> > > >>> >>
> > >> > > >>> >>
> > >> > > >>> >>
> > >> > > >>> >>
> > >> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <
> > >> rgabo@rgabostyle.com
> > >> > >
> > >> > > >>> wrote:
> > >> > > >>> >>> Take a look at update handlers [1]. It is a more
> lightweight
> > >> way
> > >> > to
> > >> > > >>> create / update your visitor documents, without having to GET
> > the
> > >> > > document,
> > >> > > >>> modify and PUT back the whole thing. It also simplifies
> dealing
> > >> with
> > >> > > >>> document revisions as my understanding is that you should not
> be
> > >> > > running
> > >> > > >>> into conflicts.
> > >> > > >>> >>>
> > >> > > >>> >>> I wouldn't expect any problem handling the concurrent
> > traffic
> > >> and
> > >> > > >>> tracking the users, but the view indexer will take some time
> > with
> > >> the
> > >> > > >>> processing itself. You can always replicate the database (or
> > parts
> > >> of
> > >> > > it
> > >> > > >>> using a replication filter) to another CouchDB instance and
> > >> perform
> > >> > the
> > >> > > >>> crunching there.
> > >> > > >>> >>>
> > >> > > >>> >>> It's fairly vague how much updates / writes your 2k-5k
> > traffic
> > >> > > would
> > >> > > >>> cause. How many requests/sec on your site? How many property
> > >> updates
> > >> > > that
> > >> > > >>> causes?
> > >> > > >>> >>>
> > >> > > >>> >>> Btw, CouchDB users, is there any way to perform bulk
> updates
> > >> > using
> > >> > > >>> update handlers, similar to _bulk_docs?
> > >> > > >>> >>>
> > >> > > >>> >>> Gabor
> > >> > > >>> >>>
> > >> > > >>> >>> [1]
> http://wiki.apache.org/couchdb/Document_Update_Handlers
> > >> > > >>> >>>
> > >> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM,
> > maku@makuchaku.inwrote:
> > >> > > >>> >>>
> > >> > > >>> >>>> Hi everyone,
> > >> > > >>> >>>>
> > >> > > >>> >>>> I came across couchdb a couple of weeks back & got really
> > >> > excited
> > >> > > by
> > >> > > >>> >>>> the fundamental change it brings by simply taking the
> > >> app-server
> > >> > > out
> > >> > > >>> >>>> of the picture.
> > >> > > >>> >>>> Must say, kudos to the dev team!
> > >> > > >>> >>>>
> > >> > > >>> >>>> I am planning to write a quick analytics solution for my
> > >> website
> > >> > -
> > >> > > >>> >>>> something on the lines of Google analytics - which will
> > >> measure
> > >> > > >>> >>>> certain properties of the visitors hitting our site.
> > >> > > >>> >>>>
> > >> > > >>> >>>> Since this is my first attempt at a JSON style document
> > >> store, I
> > >> > > >>> >>>> thought I'll share the architecture & see if I can make
> it
> > >> > better
> > >> > > (or
> > >> > > >>> >>>> correct my mistakes before I do them) :-)
> > >> > > >>> >>>>
> > >> > > >>> >>>> - For each unique visitor, create a document with his
> > >> session_id
> > >> > > as
> > >> > > >>> the doc.id
> > >> > > >>> >>>> - For each property i need to track about this visitor, I
> > >> create
> > >> > a
> > >> > > >>> >>>> key-value pair in the doc created for this visitor
> > >> > > >>> >>>> - If visitor is a returning user, use the session_id to
> > >> re-open
> > >> > > his
> > >> > > >>> >>>> doc & keep on modifying the properties
> > >> > > >>> >>>> - At end of each calculation time period (say 1 hour or
> 24
> > >> > hours),
> > >> > > I
> > >> > > >>> >>>> run a cron job which fires the map-reduce jobs by
> > requesting
> > >> the
> > >> > > >>> views
> > >> > > >>> >>>> over curl/http.
> > >> > > >>> >>>>
> > >> > > >>> >>>> A couple of questions based on above architecture...
> > >> > > >>> >>>> We see concurrent traffic ranging from 2k users to 5k
> > users.
> > >> > > >>> >>>> - Would a couchdb instance running on a good machine (say
> > >> High
> > >> > CPU
> > >> > > >>> >>>> EC2, medium instance) work well with simultaneous writes
> > >> > > happening...
> > >> > > >>> >>>> (visitors browsing, properties changing or getting
> created)
> > >> > > >>> >>>> - With a couple of million documents, would I be able to
> > >> process
> > >> > > my
> > >> > > >>> >>>> views without causing any significant impact to write
> > >> > performance?
> > >> > > >>> >>>>
> > >> > > >>> >>>> I think my questions might be biased by the fact that I
> > come
> > >> > from
> > >> > > a
> > >> > > >>> >>>> MySQL/Rails background... :-)
> > >> > > >>> >>>>
> > >> > > >>> >>>> Let me know how you guys think about this.
> > >> > > >>> >>>>
> > >> > > >>> >>>> Thanks in advance,
> > >> > > >>> >>>> --
> > >> > > >>> >>>> Mayank
> > >> > > >>> >>>> http://adomado.com
> > >> > > >>> >>>
> > >> > > >>> >>>
> > >> > > >>> >>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Using couchdb for analytics

Posted by Sam Bisbee <sa...@sbisbee.com>.
On Mon, Sep 12, 2011 at 1:27 PM, Scott Feinberg
<fe...@gmail.com> wrote:
> Unless you can ensure that only one process will be editing the document at
> a time (to ensure that you never end up holding an old revision), your going
> to have issues. I've never tried it, but I'd be under the assumption
> conflict resolution wouldn't work at all.

My favorite way for doing something like that is to use a queue. Your
app stores the analytics transactions in the queue and you have a
consumer that puts them into Couch at its own pace. If you want to do
so with a single thread, then go for it.

--
Sam Bisbee

Re: Using couchdb for analytics

Posted by Scott Feinberg <fe...@gmail.com>.
Unless you can ensure that only one process will be editing the document at
a time (to ensure that you never end up holding an old revision), your going
to have issues. I've never tried it, but I'd be under the assumption
conflict resolution wouldn't work at all.

Revision history is a large part of what makes CouchDB tick.  It would also
limit you from ever having a cluster, without revision history the cluster
would never be able to negotiate.

Your not going to end up with millions of revisions, as it says:  *
_revs_limit* defines a upper bound of document revisions which CouchDB keeps
track of, even afterCompaction <http://wiki.apache.org/couchdb/Compaction>.
The default is set to 1000 on CouchDB 0.11.

Not sure what it is set to now, but I assume it's probably the same.  Plus
even if it was 1 million revisions, you're talking about a million key value
pairs-nothing significant. And compaction would remove your excess
revisions.

Here's what you need: http://blog.couchbase.com/atomic-increments-couchdb

--Scott


On Mon, Sep 12, 2011 at 1:10 PM, maku@makuchaku.in <ma...@makuchaku.in>wrote:

> Question
>
> If a database is configured with
> _revs_limit<
> http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options
> >=1,
> will the following features still work?
> - Conflict resolution
> - Changes feed
>
> Hypothetically, to maintain an incrementing counter, we can have such a
> key/value pair in the document whose database is configured with
> _revs_limit
> = 1
>
> Thoughts?
>
> Thanks!
> --
> Mayank
> http://adomado.com
>
>
>
> On Mon, Sep 12, 2011 at 6:40 PM, maku@makuchaku.in <maku@makuchaku.in
> >wrote:
>
> > Thanks for the tip Scott.
> >
> > However, I have a feeling that compacting the database is not the correct
> > answer to this problem.
> >
> > I am going to test - limiting revs on a document.
> >
> > Lets see how that fares up...
> > But I have a hunch that if I do that, the conflict resolution strategy
> will
> > not work.
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <
> feinberg.scott@gmail.com>wrote:
> >
> >> It wouldn't consume too much space as long as your regularly compacting
> >> your
> >> database.
> >>
> >> As much as I love CouchDB and this is a CouchDB users mailing list, I
> >> tried
> >> to do something similar and I found MongoDB was better suited due to
> it's
> >> support for partial updates.
> >>
> >> I based the project of some of the work from
> http://hummingbirdstats.com/
> >> .
> >>
> >> --Scott
> >>
> >> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
> >> >wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > Considering that I've bypassed the problem of cross-domain
> communication
> >> > using proxy/iframes...
> >> >
> >> > I want to store counters in a document, incremented on each page view.
> >> > CouchDB will create a complete revision of this document for just 1
> >> counter
> >> > update.
> >> >
> >> > Wouldn't this consume too much space?
> >> > Considering that I have 1M hits in a day, I might be looking at 1M
> >> > revisions
> >> > to the document in a day.
> >> >
> >> > Any thoughts on this...
> >> >
> >> > Thanks!
> >> > --
> >> > Mayank
> >> > http://adomado.com
> >> >
> >> >
> >> >
> >> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> >> > matheis.stefan@googlemail.com> wrote:
> >> >
> >> > > What about proxying couch.foo.com through foo.com/couch? maybe not
> >> the
> >> > > complete service, at least one "special" url which triggers the
> write
> >> > > on couch?
> >> > >
> >> > > Regards
> >> > > Stefan
> >> > >
> >> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <
> maku@makuchaku.in>
> >> > > wrote:
> >> > > > Hi everyone,
> >> > > >
> >> > > > I think I had a fundamental flaw in my assumption - realized this
> >> > > > yesterday...
> >> > > > If the couchdb analytics server is hosted on couch.foo.com
> >> (foo.combeing
> >> > > > the main site) - I would never be able to make write requests via
> >> > client
> >> > > > side javascript as cross-domain policy would be a barrier.
> >> > > >
> >> > > > I thought about this - and came across a potential solution...
> >> > > > What if, I host an html page as an attachment in couchdb &
> whenever
> >> I
> >> > > have
> >> > > > to make a write call, include this html in an iframe & pass on the
> >> > > > parameters in the query string of iframe URL.
> >> > > > The iframe will have javascript which understands the incoming
> query
> >> > > string
> >> > > > params & takes action (creates POST/PUT to couchdb).
> >> > > >
> >> > > > There would be no cross-domain barriers as the html page is being
> >> > served
> >> > > > right out of couchdb itself - where ever its hosted (
> couch.foo.com)
> >> > > >
> >> > > > This might not be a performance hit - as etags will help in
> >> client-side
> >> > > > caching of the html page.
> >> > > > --
> >> > > > Mayank
> >> > > > http://adomado.com
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <
> >> maku@makuchaku.in>
> >> > > wrote:
> >> > > >
> >> > > >> Its 700 req/min :)
> >> > > >> --
> >> > > >> Mayank
> >> > > >> http://adomado.com
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <ja...@apache.org>
> >> wrote:
> >> > > >>
> >> > > >>>
> >> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> >> > > >>>
> >> > > >>> > Forgot to mention...
> >> > > >>> > All of these 700 req/sec are write requests (data logging) &
> no
> >> > data
> >> > > >>> crunching.
> >> > > >>> > Our current inhouse analytics solution (built on Rails, Mysql)
> >> gets
> >> > > >>> >>
> >> > > >>> >> about 700 req/min on an average day...
> >> > > >>>
> >> > > >>> min or sec? :)
> >> > > >>>
> >> > > >>> Cheers
> >> > > >>> Jan
> >> > > >>> --
> >> > > >>>
> >> > > >>>
> >> > > >>> >>
> >> > > >>> >> --
> >> > > >>> >> Mayank
> >> > > >>> >> http://adomado.com
> >> > > >>> >>
> >> > > >>> >>
> >> > > >>> >>
> >> > > >>> >>
> >> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <
> >> rgabo@rgabostyle.com
> >> > >
> >> > > >>> wrote:
> >> > > >>> >>> Take a look at update handlers [1]. It is a more lightweight
> >> way
> >> > to
> >> > > >>> create / update your visitor documents, without having to GET
> the
> >> > > document,
> >> > > >>> modify and PUT back the whole thing. It also simplifies dealing
> >> with
> >> > > >>> document revisions as my understanding is that you should not be
> >> > > running
> >> > > >>> into conflicts.
> >> > > >>> >>>
> >> > > >>> >>> I wouldn't expect any problem handling the concurrent
> traffic
> >> and
> >> > > >>> tracking the users, but the view indexer will take some time
> with
> >> the
> >> > > >>> processing itself. You can always replicate the database (or
> parts
> >> of
> >> > > it
> >> > > >>> using a replication filter) to another CouchDB instance and
> >> perform
> >> > the
> >> > > >>> crunching there.
> >> > > >>> >>>
> >> > > >>> >>> It's fairly vague how much updates / writes your 2k-5k
> traffic
> >> > > would
> >> > > >>> cause. How many requests/sec on your site? How many property
> >> updates
> >> > > that
> >> > > >>> causes?
> >> > > >>> >>>
> >> > > >>> >>> Btw, CouchDB users, is there any way to perform bulk updates
> >> > using
> >> > > >>> update handlers, similar to _bulk_docs?
> >> > > >>> >>>
> >> > > >>> >>> Gabor
> >> > > >>> >>>
> >> > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> >> > > >>> >>>
> >> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM,
> maku@makuchaku.inwrote:
> >> > > >>> >>>
> >> > > >>> >>>> Hi everyone,
> >> > > >>> >>>>
> >> > > >>> >>>> I came across couchdb a couple of weeks back & got really
> >> > excited
> >> > > by
> >> > > >>> >>>> the fundamental change it brings by simply taking the
> >> app-server
> >> > > out
> >> > > >>> >>>> of the picture.
> >> > > >>> >>>> Must say, kudos to the dev team!
> >> > > >>> >>>>
> >> > > >>> >>>> I am planning to write a quick analytics solution for my
> >> website
> >> > -
> >> > > >>> >>>> something on the lines of Google analytics - which will
> >> measure
> >> > > >>> >>>> certain properties of the visitors hitting our site.
> >> > > >>> >>>>
> >> > > >>> >>>> Since this is my first attempt at a JSON style document
> >> store, I
> >> > > >>> >>>> thought I'll share the architecture & see if I can make it
> >> > better
> >> > > (or
> >> > > >>> >>>> correct my mistakes before I do them) :-)
> >> > > >>> >>>>
> >> > > >>> >>>> - For each unique visitor, create a document with his
> >> session_id
> >> > > as
> >> > > >>> the doc.id
> >> > > >>> >>>> - For each property i need to track about this visitor, I
> >> create
> >> > a
> >> > > >>> >>>> key-value pair in the doc created for this visitor
> >> > > >>> >>>> - If visitor is a returning user, use the session_id to
> >> re-open
> >> > > his
> >> > > >>> >>>> doc & keep on modifying the properties
> >> > > >>> >>>> - At end of each calculation time period (say 1 hour or 24
> >> > hours),
> >> > > I
> >> > > >>> >>>> run a cron job which fires the map-reduce jobs by
> requesting
> >> the
> >> > > >>> views
> >> > > >>> >>>> over curl/http.
> >> > > >>> >>>>
> >> > > >>> >>>> A couple of questions based on above architecture...
> >> > > >>> >>>> We see concurrent traffic ranging from 2k users to 5k
> users.
> >> > > >>> >>>> - Would a couchdb instance running on a good machine (say
> >> High
> >> > CPU
> >> > > >>> >>>> EC2, medium instance) work well with simultaneous writes
> >> > > happening...
> >> > > >>> >>>> (visitors browsing, properties changing or getting created)
> >> > > >>> >>>> - With a couple of million documents, would I be able to
> >> process
> >> > > my
> >> > > >>> >>>> views without causing any significant impact to write
> >> > performance?
> >> > > >>> >>>>
> >> > > >>> >>>> I think my questions might be biased by the fact that I
> come
> >> > from
> >> > > a
> >> > > >>> >>>> MySQL/Rails background... :-)
> >> > > >>> >>>>
> >> > > >>> >>>> Let me know how you guys think about this.
> >> > > >>> >>>>
> >> > > >>> >>>> Thanks in advance,
> >> > > >>> >>>> --
> >> > > >>> >>>> Mayank
> >> > > >>> >>>> http://adomado.com
> >> > > >>> >>>
> >> > > >>> >>>
> >> > > >>> >>
> >> > > >>>
> >> > > >>>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Using couchdb for analytics

Posted by "maku@makuchaku.in" <ma...@makuchaku.in>.
Question

If a database is configured with
_revs_limit<http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options>=1,
will the following features still work?
- Conflict resolution
- Changes feed

Hypothetically, to maintain an incrementing counter, we can have such a
key/value pair in the document whose database is configured with _revs_limit
= 1

Thoughts?

Thanks!
--
Mayank
http://adomado.com



On Mon, Sep 12, 2011 at 6:40 PM, maku@makuchaku.in <ma...@makuchaku.in>wrote:

> Thanks for the tip Scott.
>
> However, I have a feeling that compacting the database is not the correct
> answer to this problem.
>
> I am going to test - limiting revs on a document.
>
> Lets see how that fares up...
> But I have a hunch that if I do that, the conflict resolution strategy will
> not work.
> --
> Mayank
> http://adomado.com
>
>
>
> On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <fe...@gmail.com>wrote:
>
>> It wouldn't consume too much space as long as your regularly compacting
>> your
>> database.
>>
>> As much as I love CouchDB and this is a CouchDB users mailing list, I
>> tried
>> to do something similar and I found MongoDB was better suited due to it's
>> support for partial updates.
>>
>> I based the project of some of the work from http://hummingbirdstats.com/
>> .
>>
>> --Scott
>>
>> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
>> >wrote:
>>
>> > Hi everyone,
>> >
>> > Considering that I've bypassed the problem of cross-domain communication
>> > using proxy/iframes...
>> >
>> > I want to store counters in a document, incremented on each page view.
>> > CouchDB will create a complete revision of this document for just 1
>> counter
>> > update.
>> >
>> > Wouldn't this consume too much space?
>> > Considering that I have 1M hits in a day, I might be looking at 1M
>> > revisions
>> > to the document in a day.
>> >
>> > Any thoughts on this...
>> >
>> > Thanks!
>> > --
>> > Mayank
>> > http://adomado.com
>> >
>> >
>> >
>> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
>> > matheis.stefan@googlemail.com> wrote:
>> >
>> > > What about proxying couch.foo.com through foo.com/couch? maybe not
>> the
>> > > complete service, at least one "special" url which triggers the write
>> > > on couch?
>> > >
>> > > Regards
>> > > Stefan
>> > >
>> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <ma...@makuchaku.in>
>> > > wrote:
>> > > > Hi everyone,
>> > > >
>> > > > I think I had a fundamental flaw in my assumption - realized this
>> > > > yesterday...
>> > > > If the couchdb analytics server is hosted on couch.foo.com
>> (foo.combeing
>> > > > the main site) - I would never be able to make write requests via
>> > client
>> > > > side javascript as cross-domain policy would be a barrier.
>> > > >
>> > > > I thought about this - and came across a potential solution...
>> > > > What if, I host an html page as an attachment in couchdb & whenever
>> I
>> > > have
>> > > > to make a write call, include this html in an iframe & pass on the
>> > > > parameters in the query string of iframe URL.
>> > > > The iframe will have javascript which understands the incoming query
>> > > string
>> > > > params & takes action (creates POST/PUT to couchdb).
>> > > >
>> > > > There would be no cross-domain barriers as the html page is being
>> > served
>> > > > right out of couchdb itself - where ever its hosted (couch.foo.com)
>> > > >
>> > > > This might not be a performance hit - as etags will help in
>> client-side
>> > > > caching of the html page.
>> > > > --
>> > > > Mayank
>> > > > http://adomado.com
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <
>> maku@makuchaku.in>
>> > > wrote:
>> > > >
>> > > >> Its 700 req/min :)
>> > > >> --
>> > > >> Mayank
>> > > >> http://adomado.com
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <ja...@apache.org>
>> wrote:
>> > > >>
>> > > >>>
>> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
>> > > >>>
>> > > >>> > Forgot to mention...
>> > > >>> > All of these 700 req/sec are write requests (data logging) & no
>> > data
>> > > >>> crunching.
>> > > >>> > Our current inhouse analytics solution (built on Rails, Mysql)
>> gets
>> > > >>> >>
>> > > >>> >> about 700 req/min on an average day...
>> > > >>>
>> > > >>> min or sec? :)
>> > > >>>
>> > > >>> Cheers
>> > > >>> Jan
>> > > >>> --
>> > > >>>
>> > > >>>
>> > > >>> >>
>> > > >>> >> --
>> > > >>> >> Mayank
>> > > >>> >> http://adomado.com
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <
>> rgabo@rgabostyle.com
>> > >
>> > > >>> wrote:
>> > > >>> >>> Take a look at update handlers [1]. It is a more lightweight
>> way
>> > to
>> > > >>> create / update your visitor documents, without having to GET the
>> > > document,
>> > > >>> modify and PUT back the whole thing. It also simplifies dealing
>> with
>> > > >>> document revisions as my understanding is that you should not be
>> > > running
>> > > >>> into conflicts.
>> > > >>> >>>
>> > > >>> >>> I wouldn't expect any problem handling the concurrent traffic
>> and
>> > > >>> tracking the users, but the view indexer will take some time with
>> the
>> > > >>> processing itself. You can always replicate the database (or parts
>> of
>> > > it
>> > > >>> using a replication filter) to another CouchDB instance and
>> perform
>> > the
>> > > >>> crunching there.
>> > > >>> >>>
>> > > >>> >>> It's fairly vague how much updates / writes your 2k-5k traffic
>> > > would
>> > > >>> cause. How many requests/sec on your site? How many property
>> updates
>> > > that
>> > > >>> causes?
>> > > >>> >>>
>> > > >>> >>> Btw, CouchDB users, is there any way to perform bulk updates
>> > using
>> > > >>> update handlers, similar to _bulk_docs?
>> > > >>> >>>
>> > > >>> >>> Gabor
>> > > >>> >>>
>> > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
>> > > >>> >>>
>> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.inwrote:
>> > > >>> >>>
>> > > >>> >>>> Hi everyone,
>> > > >>> >>>>
>> > > >>> >>>> I came across couchdb a couple of weeks back & got really
>> > excited
>> > > by
>> > > >>> >>>> the fundamental change it brings by simply taking the
>> app-server
>> > > out
>> > > >>> >>>> of the picture.
>> > > >>> >>>> Must say, kudos to the dev team!
>> > > >>> >>>>
>> > > >>> >>>> I am planning to write a quick analytics solution for my
>> website
>> > -
>> > > >>> >>>> something on the lines of Google analytics - which will
>> measure
>> > > >>> >>>> certain properties of the visitors hitting our site.
>> > > >>> >>>>
>> > > >>> >>>> Since this is my first attempt at a JSON style document
>> store, I
>> > > >>> >>>> thought I'll share the architecture & see if I can make it
>> > better
>> > > (or
>> > > >>> >>>> correct my mistakes before I do them) :-)
>> > > >>> >>>>
>> > > >>> >>>> - For each unique visitor, create a document with his
>> session_id
>> > > as
>> > > >>> the doc.id
>> > > >>> >>>> - For each property i need to track about this visitor, I
>> create
>> > a
>> > > >>> >>>> key-value pair in the doc created for this visitor
>> > > >>> >>>> - If visitor is a returning user, use the session_id to
>> re-open
>> > > his
>> > > >>> >>>> doc & keep on modifying the properties
>> > > >>> >>>> - At end of each calculation time period (say 1 hour or 24
>> > hours),
>> > > I
>> > > >>> >>>> run a cron job which fires the map-reduce jobs by requesting
>> the
>> > > >>> views
>> > > >>> >>>> over curl/http.
>> > > >>> >>>>
>> > > >>> >>>> A couple of questions based on above architecture...
>> > > >>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
>> > > >>> >>>> - Would a couchdb instance running on a good machine (say
>> High
>> > CPU
>> > > >>> >>>> EC2, medium instance) work well with simultaneous writes
>> > > happening...
>> > > >>> >>>> (visitors browsing, properties changing or getting created)
>> > > >>> >>>> - With a couple of million documents, would I be able to
>> process
>> > > my
>> > > >>> >>>> views without causing any significant impact to write
>> > performance?
>> > > >>> >>>>
>> > > >>> >>>> I think my questions might be biased by the fact that I come
>> > from
>> > > a
>> > > >>> >>>> MySQL/Rails background... :-)
>> > > >>> >>>>
>> > > >>> >>>> Let me know how you guys think about this.
>> > > >>> >>>>
>> > > >>> >>>> Thanks in advance,
>> > > >>> >>>> --
>> > > >>> >>>> Mayank
>> > > >>> >>>> http://adomado.com
>> > > >>> >>>
>> > > >>> >>>
>> > > >>> >>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Re: Using couchdb for analytics

Posted by "maku@makuchaku.in" <ma...@makuchaku.in>.
Thanks for the tip Scott.

However, I have a feeling that compacting the database is not the correct
answer to this problem.

I am going to test - limiting revs on a document.

Lets see how that fares up...
But I have a hunch that if I do that, the conflict resolution strategy will
not work.
--
Mayank
http://adomado.com



On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <fe...@gmail.com>wrote:

> It wouldn't consume too much space as long as your regularly compacting
> your
> database.
>
> As much as I love CouchDB and this is a CouchDB users mailing list, I tried
> to do something similar and I found MongoDB was better suited due to it's
> support for partial updates.
>
> I based the project of some of the work from http://hummingbirdstats.com/.
>
> --Scott
>
> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
> >wrote:
>
> > Hi everyone,
> >
> > Considering that I've bypassed the problem of cross-domain communication
> > using proxy/iframes...
> >
> > I want to store counters in a document, incremented on each page view.
> > CouchDB will create a complete revision of this document for just 1
> counter
> > update.
> >
> > Wouldn't this consume too much space?
> > Considering that I have 1M hits in a day, I might be looking at 1M
> > revisions
> > to the document in a day.
> >
> > Any thoughts on this...
> >
> > Thanks!
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> > matheis.stefan@googlemail.com> wrote:
> >
> > > What about proxying couch.foo.com through foo.com/couch? maybe not the
> > > complete service, at least one "special" url which triggers the write
> > > on couch?
> > >
> > > Regards
> > > Stefan
> > >
> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <ma...@makuchaku.in>
> > > wrote:
> > > > Hi everyone,
> > > >
> > > > I think I had a fundamental flaw in my assumption - realized this
> > > > yesterday...
> > > > If the couchdb analytics server is hosted on couch.foo.com
> (foo.combeing
> > > > the main site) - I would never be able to make write requests via
> > client
> > > > side javascript as cross-domain policy would be a barrier.
> > > >
> > > > I thought about this - and came across a potential solution...
> > > > What if, I host an html page as an attachment in couchdb & whenever I
> > > have
> > > > to make a write call, include this html in an iframe & pass on the
> > > > parameters in the query string of iframe URL.
> > > > The iframe will have javascript which understands the incoming query
> > > string
> > > > params & takes action (creates POST/PUT to couchdb).
> > > >
> > > > There would be no cross-domain barriers as the html page is being
> > served
> > > > right out of couchdb itself - where ever its hosted (couch.foo.com)
> > > >
> > > > This might not be a performance hit - as etags will help in
> client-side
> > > > caching of the html page.
> > > > --
> > > > Mayank
> > > > http://adomado.com
> > > >
> > > >
> > > >
> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <maku@makuchaku.in
> >
> > > wrote:
> > > >
> > > >> Its 700 req/min :)
> > > >> --
> > > >> Mayank
> > > >> http://adomado.com
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <ja...@apache.org>
> wrote:
> > > >>
> > > >>>
> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> > > >>>
> > > >>> > Forgot to mention...
> > > >>> > All of these 700 req/sec are write requests (data logging) & no
> > data
> > > >>> crunching.
> > > >>> > Our current inhouse analytics solution (built on Rails, Mysql)
> gets
> > > >>> >>
> > > >>> >> about 700 req/min on an average day...
> > > >>>
> > > >>> min or sec? :)
> > > >>>
> > > >>> Cheers
> > > >>> Jan
> > > >>> --
> > > >>>
> > > >>>
> > > >>> >>
> > > >>> >> --
> > > >>> >> Mayank
> > > >>> >> http://adomado.com
> > > >>> >>
> > > >>> >>
> > > >>> >>
> > > >>> >>
> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <
> rgabo@rgabostyle.com
> > >
> > > >>> wrote:
> > > >>> >>> Take a look at update handlers [1]. It is a more lightweight
> way
> > to
> > > >>> create / update your visitor documents, without having to GET the
> > > document,
> > > >>> modify and PUT back the whole thing. It also simplifies dealing
> with
> > > >>> document revisions as my understanding is that you should not be
> > > running
> > > >>> into conflicts.
> > > >>> >>>
> > > >>> >>> I wouldn't expect any problem handling the concurrent traffic
> and
> > > >>> tracking the users, but the view indexer will take some time with
> the
> > > >>> processing itself. You can always replicate the database (or parts
> of
> > > it
> > > >>> using a replication filter) to another CouchDB instance and perform
> > the
> > > >>> crunching there.
> > > >>> >>>
> > > >>> >>> It's fairly vague how much updates / writes your 2k-5k traffic
> > > would
> > > >>> cause. How many requests/sec on your site? How many property
> updates
> > > that
> > > >>> causes?
> > > >>> >>>
> > > >>> >>> Btw, CouchDB users, is there any way to perform bulk updates
> > using
> > > >>> update handlers, similar to _bulk_docs?
> > > >>> >>>
> > > >>> >>> Gabor
> > > >>> >>>
> > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> > > >>> >>>
> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.inwrote:
> > > >>> >>>
> > > >>> >>>> Hi everyone,
> > > >>> >>>>
> > > >>> >>>> I came across couchdb a couple of weeks back & got really
> > excited
> > > by
> > > >>> >>>> the fundamental change it brings by simply taking the
> app-server
> > > out
> > > >>> >>>> of the picture.
> > > >>> >>>> Must say, kudos to the dev team!
> > > >>> >>>>
> > > >>> >>>> I am planning to write a quick analytics solution for my
> website
> > -
> > > >>> >>>> something on the lines of Google analytics - which will
> measure
> > > >>> >>>> certain properties of the visitors hitting our site.
> > > >>> >>>>
> > > >>> >>>> Since this is my first attempt at a JSON style document store,
> I
> > > >>> >>>> thought I'll share the architecture & see if I can make it
> > better
> > > (or
> > > >>> >>>> correct my mistakes before I do them) :-)
> > > >>> >>>>
> > > >>> >>>> - For each unique visitor, create a document with his
> session_id
> > > as
> > > >>> the doc.id
> > > >>> >>>> - For each property i need to track about this visitor, I
> create
> > a
> > > >>> >>>> key-value pair in the doc created for this visitor
> > > >>> >>>> - If visitor is a returning user, use the session_id to
> re-open
> > > his
> > > >>> >>>> doc & keep on modifying the properties
> > > >>> >>>> - At end of each calculation time period (say 1 hour or 24
> > hours),
> > > I
> > > >>> >>>> run a cron job which fires the map-reduce jobs by requesting
> the
> > > >>> views
> > > >>> >>>> over curl/http.
> > > >>> >>>>
> > > >>> >>>> A couple of questions based on above architecture...
> > > >>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
> > > >>> >>>> - Would a couchdb instance running on a good machine (say High
> > CPU
> > > >>> >>>> EC2, medium instance) work well with simultaneous writes
> > > happening...
> > > >>> >>>> (visitors browsing, properties changing or getting created)
> > > >>> >>>> - With a couple of million documents, would I be able to
> process
> > > my
> > > >>> >>>> views without causing any significant impact to write
> > performance?
> > > >>> >>>>
> > > >>> >>>> I think my questions might be biased by the fact that I come
> > from
> > > a
> > > >>> >>>> MySQL/Rails background... :-)
> > > >>> >>>>
> > > >>> >>>> Let me know how you guys think about this.
> > > >>> >>>>
> > > >>> >>>> Thanks in advance,
> > > >>> >>>> --
> > > >>> >>>> Mayank
> > > >>> >>>> http://adomado.com
> > > >>> >>>
> > > >>> >>>
> > > >>> >>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: Using couchdb for analytics

Posted by Scott Feinberg <fe...@gmail.com>.
It wouldn't consume too much space as long as your regularly compacting your
database.

As much as I love CouchDB and this is a CouchDB users mailing list, I tried
to do something similar and I found MongoDB was better suited due to it's
support for partial updates.

I based the project of some of the work from http://hummingbirdstats.com/.

--Scott

On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <ma...@makuchaku.in>wrote:

> Hi everyone,
>
> Considering that I've bypassed the problem of cross-domain communication
> using proxy/iframes...
>
> I want to store counters in a document, incremented on each page view.
> CouchDB will create a complete revision of this document for just 1 counter
> update.
>
> Wouldn't this consume too much space?
> Considering that I have 1M hits in a day, I might be looking at 1M
> revisions
> to the document in a day.
>
> Any thoughts on this...
>
> Thanks!
> --
> Mayank
> http://adomado.com
>
>
>
> On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> matheis.stefan@googlemail.com> wrote:
>
> > What about proxying couch.foo.com through foo.com/couch? maybe not the
> > complete service, at least one "special" url which triggers the write
> > on couch?
> >
> > Regards
> > Stefan
> >
> > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <ma...@makuchaku.in>
> > wrote:
> > > Hi everyone,
> > >
> > > I think I had a fundamental flaw in my assumption - realized this
> > > yesterday...
> > > If the couchdb analytics server is hosted on couch.foo.com(foo.combeing
> > > the main site) - I would never be able to make write requests via
> client
> > > side javascript as cross-domain policy would be a barrier.
> > >
> > > I thought about this - and came across a potential solution...
> > > What if, I host an html page as an attachment in couchdb & whenever I
> > have
> > > to make a write call, include this html in an iframe & pass on the
> > > parameters in the query string of iframe URL.
> > > The iframe will have javascript which understands the incoming query
> > string
> > > params & takes action (creates POST/PUT to couchdb).
> > >
> > > There would be no cross-domain barriers as the html page is being
> served
> > > right out of couchdb itself - where ever its hosted (couch.foo.com)
> > >
> > > This might not be a performance hit - as etags will help in client-side
> > > caching of the html page.
> > > --
> > > Mayank
> > > http://adomado.com
> > >
> > >
> > >
> > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <ma...@makuchaku.in>
> > wrote:
> > >
> > >> Its 700 req/min :)
> > >> --
> > >> Mayank
> > >> http://adomado.com
> > >>
> > >>
> > >>
> > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <ja...@apache.org> wrote:
> > >>
> > >>>
> > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> > >>>
> > >>> > Forgot to mention...
> > >>> > All of these 700 req/sec are write requests (data logging) & no
> data
> > >>> crunching.
> > >>> > Our current inhouse analytics solution (built on Rails, Mysql) gets
> > >>> >>
> > >>> >> about 700 req/min on an average day...
> > >>>
> > >>> min or sec? :)
> > >>>
> > >>> Cheers
> > >>> Jan
> > >>> --
> > >>>
> > >>>
> > >>> >>
> > >>> >> --
> > >>> >> Mayank
> > >>> >> http://adomado.com
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com
> >
> > >>> wrote:
> > >>> >>> Take a look at update handlers [1]. It is a more lightweight way
> to
> > >>> create / update your visitor documents, without having to GET the
> > document,
> > >>> modify and PUT back the whole thing. It also simplifies dealing with
> > >>> document revisions as my understanding is that you should not be
> > running
> > >>> into conflicts.
> > >>> >>>
> > >>> >>> I wouldn't expect any problem handling the concurrent traffic and
> > >>> tracking the users, but the view indexer will take some time with the
> > >>> processing itself. You can always replicate the database (or parts of
> > it
> > >>> using a replication filter) to another CouchDB instance and perform
> the
> > >>> crunching there.
> > >>> >>>
> > >>> >>> It's fairly vague how much updates / writes your 2k-5k traffic
> > would
> > >>> cause. How many requests/sec on your site? How many property updates
> > that
> > >>> causes?
> > >>> >>>
> > >>> >>> Btw, CouchDB users, is there any way to perform bulk updates
> using
> > >>> update handlers, similar to _bulk_docs?
> > >>> >>>
> > >>> >>> Gabor
> > >>> >>>
> > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> > >>> >>>
> > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
> > >>> >>>
> > >>> >>>> Hi everyone,
> > >>> >>>>
> > >>> >>>> I came across couchdb a couple of weeks back & got really
> excited
> > by
> > >>> >>>> the fundamental change it brings by simply taking the app-server
> > out
> > >>> >>>> of the picture.
> > >>> >>>> Must say, kudos to the dev team!
> > >>> >>>>
> > >>> >>>> I am planning to write a quick analytics solution for my website
> -
> > >>> >>>> something on the lines of Google analytics - which will measure
> > >>> >>>> certain properties of the visitors hitting our site.
> > >>> >>>>
> > >>> >>>> Since this is my first attempt at a JSON style document store, I
> > >>> >>>> thought I'll share the architecture & see if I can make it
> better
> > (or
> > >>> >>>> correct my mistakes before I do them) :-)
> > >>> >>>>
> > >>> >>>> - For each unique visitor, create a document with his session_id
> > as
> > >>> the doc.id
> > >>> >>>> - For each property i need to track about this visitor, I create
> a
> > >>> >>>> key-value pair in the doc created for this visitor
> > >>> >>>> - If visitor is a returning user, use the session_id to re-open
> > his
> > >>> >>>> doc & keep on modifying the properties
> > >>> >>>> - At end of each calculation time period (say 1 hour or 24
> hours),
> > I
> > >>> >>>> run a cron job which fires the map-reduce jobs by requesting the
> > >>> views
> > >>> >>>> over curl/http.
> > >>> >>>>
> > >>> >>>> A couple of questions based on above architecture...
> > >>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
> > >>> >>>> - Would a couchdb instance running on a good machine (say High
> CPU
> > >>> >>>> EC2, medium instance) work well with simultaneous writes
> > happening...
> > >>> >>>> (visitors browsing, properties changing or getting created)
> > >>> >>>> - With a couple of million documents, would I be able to process
> > my
> > >>> >>>> views without causing any significant impact to write
> performance?
> > >>> >>>>
> > >>> >>>> I think my questions might be biased by the fact that I come
> from
> > a
> > >>> >>>> MySQL/Rails background... :-)
> > >>> >>>>
> > >>> >>>> Let me know how you guys think about this.
> > >>> >>>>
> > >>> >>>> Thanks in advance,
> > >>> >>>> --
> > >>> >>>> Mayank
> > >>> >>>> http://adomado.com
> > >>> >>>
> > >>> >>>
> > >>> >>
> > >>>
> > >>>
> > >>
> > >
> >
>