You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Stephan Bardubitzki <st...@bardubitzki.com> on 2013/03/15 01:56:32 UTC

Tracking doc access

Hi there,

I have a task where I need to track how often a doc is accessed. The two 
possible ways I can think of are:

 1. add an array to the doc and add the timestamp when it is accessed
 2. create a new document and add the doc._id and the timestamp

Which one would you prefer? Or is there a better solution?

Thanks,
Stephan


Re: Tracking doc access

Posted by svilen <az...@svilendobrev.com>.
+1
as for the id: think multiple accesses at *same* time, for same user from diff.
machines/devices/browser/session-in-browser. in theory all those can be
same, but that's still 2 acceses.

On Fri, 15 Mar 2013 02:23:59 +0000
Jim Klo <ji...@sri.com> wrote:

> One more thing… inserting new doc is also not prone to document
> conflicts since you're not updating a document when logging access
> from multiple threads.
> 
> - JK
> 
> On Mar 14, 2013, at 7:13 PM, Jim Klo <ji...@sri.com>
>  wrote:
> 
> > I think you'd be better off tracking access by inserting an
> > immutable document with a new timestamp and doc._id of the accessed
> > document.
> > 
> > You could then create a view that 'joined' the timestamp doc with
> > real doc, by emiting timestamp as key and { _id: accessed_doc._id }
> > as the value then requesting with_docs=true if I remember
> > correctly… 
> > 
> > Updating doc to append a timestamp would be inherently slow if you
> > have a high volume of repeat access and cause your views to
> > consistently have to update…
> > 
> > just my 2¢
> > 
> > - Jim
> > 
> > Jim Klo
> > Senior Software Engineer
> > Center for Software Engineering
> > SRI International
> > t.	@nsomnac
> > 
> > On Mar 14, 2013, at 5:56 PM, Stephan Bardubitzki
> > <st...@bardubitzki.com> wrote:
> > 
> >> Hi there,
> >> 
> >> I have a task where I need to track how often a doc is accessed.
> >> The two possible ways I can think of are:
> >> 
> >> 1. add an array to the doc and add the timestamp when it is
> >> accessed 2. create a new document and add the doc._id and the
> >> timestamp
> >> 
> >> Which one would you prefer? Or is there a better solution?
> >> 
> >> Thanks,
> >> Stephan
> >> 
> > 
> 

Re: Tracking doc access

Posted by Jim Klo <ji...@sri.com>.
One more thing… inserting new doc is also not prone to document conflicts since you're not updating a document when logging access from multiple threads.

- JK

On Mar 14, 2013, at 7:13 PM, Jim Klo <ji...@sri.com>
 wrote:

> I think you'd be better off tracking access by inserting an immutable document with a new timestamp and doc._id of the accessed document.
> 
> You could then create a view that 'joined' the timestamp doc with real doc, by emiting timestamp as key and { _id: accessed_doc._id } as the value then requesting with_docs=true if I remember correctly… 
> 
> Updating doc to append a timestamp would be inherently slow if you have a high volume of repeat access and cause your views to consistently have to update…
> 
> just my 2¢
> 
> - Jim
> 
> Jim Klo
> Senior Software Engineer
> Center for Software Engineering
> SRI International
> t.	@nsomnac
> 
> On Mar 14, 2013, at 5:56 PM, Stephan Bardubitzki <st...@bardubitzki.com>
>  wrote:
> 
>> Hi there,
>> 
>> I have a task where I need to track how often a doc is accessed. The two possible ways I can think of are:
>> 
>> 1. add an array to the doc and add the timestamp when it is accessed
>> 2. create a new document and add the doc._id and the timestamp
>> 
>> Which one would you prefer? Or is there a better solution?
>> 
>> Thanks,
>> Stephan
>> 
> 


Re: Tracking doc access

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
Hi Jim,

thanks for your feedback, makes sense to me.

Stephan

On 13-03-14 07:13 PM, Jim Klo wrote:
> I think you'd be better off tracking access by inserting an immutable 
> document with a new timestamp and doc._id of the accessed document.
>
> You could then create a view that 'joined' the timestamp doc with real 
> doc, by emiting timestamp as key and { _id: accessed_doc._id } as the 
> value then requesting with_docs=true if I remember correctly…
>
> Updating doc to append a timestamp would be inherently slow if you 
> have a high volume of repeat access and cause your views to 
> consistently have to update…
>
> just my 2¢
>
> - Jim
>
> *
> *
> *
> *Jim Klo*
> Senior Software Engineer
> Center for Software Engineering
> SRI International
> *
> *
> *
> *
> *
> *
> t.@nsomnac
> *
> *
> *
>
> On Mar 14, 2013, at 5:56 PM, Stephan Bardubitzki 
> <stephan@bardubitzki.com <ma...@bardubitzki.com>>
>  wrote:
>
>> Hi there,
>>
>> I have a task where I need to track how often a doc is accessed. The 
>> two possible ways I can think of are:
>>
>> 1. add an array to the doc and add the timestamp when it is accessed
>> 2. create a new document and add the doc._id and the timestamp
>>
>> Which one would you prefer? Or is there a better solution?
>>
>> Thanks,
>> Stephan
>>
>


Re: Tracking doc access

Posted by Jim Klo <ji...@sri.com>.
I think you'd be better off tracking access by inserting an immutable document with a new timestamp and doc._id of the accessed document.

You could then create a view that 'joined' the timestamp doc with real doc, by emiting timestamp as key and { _id: accessed_doc._id } as the value then requesting with_docs=true if I remember correctly… 

Updating doc to append a timestamp would be inherently slow if you have a high volume of repeat access and cause your views to consistently have to update…

just my 2¢

- Jim

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t.	@nsomnac

On Mar 14, 2013, at 5:56 PM, Stephan Bardubitzki <st...@bardubitzki.com>
 wrote:

> Hi there,
> 
> I have a task where I need to track how often a doc is accessed. The two possible ways I can think of are:
> 
> 1. add an array to the doc and add the timestamp when it is accessed
> 2. create a new document and add the doc._id and the timestamp
> 
> Which one would you prefer? Or is there a better solution?
> 
> Thanks,
> Stephan
> 


Re: Tracking doc access

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
Hi Wendall,

this is something I was looking for. Also, thanks for your feedback on 
performance, you have saved me a lot of time.

Stephan


On 13-03-14 08:22 PM, Wendall Cada wrote:
> The performance of a write per read in updating the doc with a 
> timestamp would be very, very poor in CouchDB.
>
> The best scenario is create a separate stats database. Every time a 
> doc in the database you are tracking for is accessed, create a doc 
> describing the request in a stats database. Creating new docs in 
> CouchDB is very inexpensive, so you'll not see any performance issues 
> with this versus updating docs per request.
>
> Create a new doc in the stats db like this:
> {
> "db": "name_of_tracked_db",
> "id": "_id_of_doc_being_tracked",
> "timestamp": timestamp
> }
>
> Then create a view in this database for your database that maps the 
> values. You can create several view indexes to separate the data for 
> whatever your needs are.
>
> To view :
> "doc_access": {
>     "map": "function(doc) {
>         emit([doc.db, doc.id, doc.timestamp], 1);
>     }",
>     "reduce": "_sum"
> }
>
> A mock query for this to see the number of times a doc was accessed 
> over the entire date range would be:
>
> http://localhost:5984/stats/_design/data/_view/doc_access?startkey=["name_of_tracked_db","_id_of_doc_being_tracked",""]&endkey=["name_of_tracked_db","_id_of_doc_being_tracked",{}]&group=true 
>
>
> You'd get back a result like this:
> {"rows": [
> {"key":["name_of_tracked_db","_id_of_doc_being_tracked"], "value": 42}
> ]}
>
> If you want to get results for a specific range of dates, simply add 
> the dates to the third component of the query.
>
> This method gives you the ability to get stats for the access counts 
> for an entire db, a range of docs, or a single doc for any given 
> period of time.
>
> The advantage of this approach 1. it's fast 2. it is extremely flexible
>
> The disadvantage is that it takes up a ton of disk space if you never 
> purge old items from the db. I've been tracking every single page 
> request to our servers in this way with quite a bit of metadata in the 
> docs since Dec. 2010. That database is currently 5GB compacted for 
> ~50k page requests per day over this period of time. I never had the 
> need to delete a single doc from this db.
>
> I don't have any benchmarks for a comparison between the two methods, 
> but I'd strongly discourage a write per read model for your accessed 
> docs.
>
> For an understanding about how the ordering for views works, see 
> http://wiki.apache.org/couchdb/View_collation
>
> HTH,
>
> Wendall
>
> On 03/14/2013 07:16 PM, Stephan Bardubitzki wrote:
>> Hi Thomas,
>>
>> no, I need only to track read, and I need the timestamp for some charts.
>>
>> Stephan
>>
>> On 13-03-14 07:02 PM, Thomas Hommers wrote:
>>> Hi Stephan,
>>>
>>> With 'accessed' do you mean read and write ? In case you just want 
>>> to track write access i believe you could use the _rev attribute.
>>>
>>> Regards
>>> Thomas
>>>
>>>
>>>
>>> ----- Reply message -----
>>> From: "Stephan Bardubitzki" <st...@bardubitzki.com>
>>> To: "user@couchdb.apache.org" <us...@couchdb.apache.org>
>>> Subject: Tracking doc access
>>> Date: Fri, Mar 15, 2013 08:57
>>>
>>>
>>>
>>> Hi there,
>>>
>>> I have a task where I need to track how often a doc is accessed. The 
>>> two
>>> possible ways I can think of are:
>>>
>>>   1. add an array to the doc and add the timestamp when it is accessed
>>>   2. create a new document and add the doc._id and the timestamp
>>>
>>> Which one would you prefer? Or is there a better solution?
>>>
>>> Thanks,
>>> Stephan
>>>
>>>
>>> --------------------------------
>>> Spam/Virus scanning by CanIt Pro
>>>
>>> For more information see
>>> http://www.kgbinternet.com/SpamFilter.htm
>>>
>>> To control your spam filter, log in at
>>> http://filter.kgbinternet.com
>>>
>>
>
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>


Re: Tracking doc access

Posted by Wendall Cada <we...@83864.com>.
The performance of a write per read in updating the doc with a timestamp 
would be very, very poor in CouchDB.

The best scenario is create a separate stats database. Every time a doc 
in the database you are tracking for is accessed, create a doc 
describing the request in a stats database. Creating new docs in CouchDB 
is very inexpensive, so you'll not see any performance issues with this 
versus updating docs per request.

Create a new doc in the stats db like this:
{
"db": "name_of_tracked_db",
"id": "_id_of_doc_being_tracked",
"timestamp": timestamp
}

Then create a view in this database for your database that maps the 
values. You can create several view indexes to separate the data for 
whatever your needs are.

To view :
"doc_access": {
     "map": "function(doc) {
         emit([doc.db, doc.id, doc.timestamp], 1);
     }",
     "reduce": "_sum"
}

A mock query for this to see the number of times a doc was accessed over 
the entire date range would be:

http://localhost:5984/stats/_design/data/_view/doc_access?startkey=["name_of_tracked_db","_id_of_doc_being_tracked",""]&endkey=["name_of_tracked_db","_id_of_doc_being_tracked",{}]&group=true

You'd get back a result like this:
{"rows": [
{"key":["name_of_tracked_db","_id_of_doc_being_tracked"], "value": 42}
]}

If you want to get results for a specific range of dates, simply add the 
dates to the third component of the query.

This method gives you the ability to get stats for the access counts for 
an entire db, a range of docs, or a single doc for any given period of time.

The advantage of this approach 1. it's fast 2. it is extremely flexible

The disadvantage is that it takes up a ton of disk space if you never 
purge old items from the db. I've been tracking every single page 
request to our servers in this way with quite a bit of metadata in the 
docs since Dec. 2010. That database is currently 5GB compacted for ~50k 
page requests per day over this period of time. I never had the need to 
delete a single doc from this db.

I don't have any benchmarks for a comparison between the two methods, 
but I'd strongly discourage a write per read model for your accessed docs.

For an understanding about how the ordering for views works, see 
http://wiki.apache.org/couchdb/View_collation

HTH,

Wendall

On 03/14/2013 07:16 PM, Stephan Bardubitzki wrote:
> Hi Thomas,
>
> no, I need only to track read, and I need the timestamp for some charts.
>
> Stephan
>
> On 13-03-14 07:02 PM, Thomas Hommers wrote:
>> Hi Stephan,
>>
>> With 'accessed' do you mean read and write ? In case you just want to 
>> track write access i believe you could use the _rev attribute.
>>
>> Regards
>> Thomas
>>
>>
>>
>> ----- Reply message -----
>> From: "Stephan Bardubitzki" <st...@bardubitzki.com>
>> To: "user@couchdb.apache.org" <us...@couchdb.apache.org>
>> Subject: Tracking doc access
>> Date: Fri, Mar 15, 2013 08:57
>>
>>
>>
>> Hi there,
>>
>> I have a task where I need to track how often a doc is accessed. The two
>> possible ways I can think of are:
>>
>>   1. add an array to the doc and add the timestamp when it is accessed
>>   2. create a new document and add the doc._id and the timestamp
>>
>> Which one would you prefer? Or is there a better solution?
>>
>> Thanks,
>> Stephan
>>
>>
>> --------------------------------
>> Spam/Virus scanning by CanIt Pro
>>
>> For more information see
>> http://www.kgbinternet.com/SpamFilter.htm
>>
>> To control your spam filter, log in at
>> http://filter.kgbinternet.com
>>
>


Re: Tracking doc access

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
Hi Thomas,

no, I need only to track read, and I need the timestamp for some charts.

Stephan

On 13-03-14 07:02 PM, Thomas Hommers wrote:
> Hi Stephan,
>
> With 'accessed' do you mean read and write ? In case you just want to track write access i believe you could use the _rev attribute.
>
> Regards
> Thomas
>
>
>
> ----- Reply message -----
> From: "Stephan Bardubitzki" <st...@bardubitzki.com>
> To: "user@couchdb.apache.org" <us...@couchdb.apache.org>
> Subject: Tracking doc access
> Date: Fri, Mar 15, 2013 08:57
>
>
>
> Hi there,
>
> I have a task where I need to track how often a doc is accessed. The two
> possible ways I can think of are:
>
>   1. add an array to the doc and add the timestamp when it is accessed
>   2. create a new document and add the doc._id and the timestamp
>
> Which one would you prefer? Or is there a better solution?
>
> Thanks,
> Stephan
>
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>


Re: Tracking doc access

Posted by Thomas Hommers <th...@ebalu.com>.
Hi Stephan,

With 'accessed' do you mean read and write ? In case you just want to track write access i believe you could use the _rev attribute.

Regards
Thomas



----- Reply message -----
From: "Stephan Bardubitzki" <st...@bardubitzki.com>
To: "user@couchdb.apache.org" <us...@couchdb.apache.org>
Subject: Tracking doc access
Date: Fri, Mar 15, 2013 08:57



Hi there,

I have a task where I need to track how often a doc is accessed. The two
possible ways I can think of are:

 1. add an array to the doc and add the timestamp when it is accessed
 2. create a new document and add the doc._id and the timestamp

Which one would you prefer? Or is there a better solution?

Thanks,
Stephan