You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by go canal <go...@yahoo.com> on 2009/09/26 03:29:07 UTC

couchdb: suitable for this type of applications ?

Hello,
another question. here is the use case:
 - a group of 10 engineers working on a project, 
 - total files created over 12 months: 500
 - average update per file 15 times
 - average file size 20MB
 - file format: MS Office, PDF, CAD drawings.

I thought CouchDB is designed to support this type of applications but is it correct to say that, every time there is an update to a doc, CouchDB will create a new version of the (whole) document ?
So can I say the total storage will be 500 * 20MB * 15 = 150GB without counting other overhead ?


Another question, if I only modify one field, is the attachment also copied into the new version of the document ?

I also saw this message posted almost a year ago, talking about if CouchDB is suitable for frequent small writes application, and it seems that some client side buffering is needed :
http://markmail.org/message/klrbkh36ivxg46ax
 
Is it still true today ?

rgds,
canal



      

Re: couchdb: suitable for this type of applications ?

Posted by go canal <go...@yahoo.com>.
thanks for quick response. Duplicating metadata only is fine.
yes, with compacting, that's even better.

 rgds,
canal




________________________________
From: Paul Davis <pa...@gmail.com>
To: user@couchdb.apache.org
Sent: Saturday, September 26, 2009 9:59:29 AM
Subject: Re: couchdb: suitable for this type of applications ?

Attachments aren't copied on doc updates. And you're forgetting about
compaction.

HTH,
Paul Davis


On Fri, Sep 25, 2009 at 9:29 PM, go canal <go...@yahoo.com> wrote:
> Hello,
> another question. here is the use case:
>  - a group of 10 engineers working on a project,
>  - total files created over 12 months: 500
>  - average update per file 15 times
>  - average file size 20MB
>  - file format: MS Office, PDF, CAD drawings.
>
> I thought CouchDB is designed to support this type of applications but is it correct to say that, every time there is an update to a doc, CouchDB will create a new version of the (whole) document ?
> So can I say the total storage will be 500 * 20MB * 15 = 150GB without counting other overhead ?
>
>
> Another question, if I only modify one field, is the attachment also copied into the new version of the document ?
>
> I also saw this message posted almost a year ago, talking about if CouchDB is suitable for frequent small writes application, and it seems that some client side buffering is needed :
> http://markmail.org/message/klrbkh36ivxg46ax
>
> Is it still true today ?
>
> rgds,
> canal
>
>
>
>



      

Re: couchdb: suitable for this type of applications ?

Posted by Paul Davis <pa...@gmail.com>.
Attachments aren't copied on doc updates. And you're forgetting about
compaction.

HTH,
Paul Davis


On Fri, Sep 25, 2009 at 9:29 PM, go canal <go...@yahoo.com> wrote:
> Hello,
> another question. here is the use case:
>  - a group of 10 engineers working on a project,
>  - total files created over 12 months: 500
>  - average update per file 15 times
>  - average file size 20MB
>  - file format: MS Office, PDF, CAD drawings.
>
> I thought CouchDB is designed to support this type of applications but is it correct to say that, every time there is an update to a doc, CouchDB will create a new version of the (whole) document ?
> So can I say the total storage will be 500 * 20MB * 15 = 150GB without counting other overhead ?
>
>
> Another question, if I only modify one field, is the attachment also copied into the new version of the document ?
>
> I also saw this message posted almost a year ago, talking about if CouchDB is suitable for frequent small writes application, and it seems that some client side buffering is needed :
> http://markmail.org/message/klrbkh36ivxg46ax
>
> Is it still true today ?
>
> rgds,
> canal
>
>
>
>

Re: couchdb: suitable for this type of applications ?

Posted by go canal <go...@yahoo.com>.
Thanks Jesse. I also learn a lot from your (?) post http://sitr.us/2009/06/30/database-queries-the-couchdb-way.html
 rgds,
canal




________________________________
From: Jesse Hallett <ha...@gmail.com>
To: user@couchdb.apache.org
Sent: Sunday, September 27, 2009 9:24:06 AM
Subject: Re: couchdb: suitable for this type of applications ?

On Sat, Sep 26, 2009 at 12:22 AM, Chris Anderson <jc...@apache.org> wrote:
> On Fri, Sep 25, 2009 at 8:34 PM, go canal <go...@yahoo.com> wrote:
>>>>When you compact the database it removes all but the most recent version of each document.
>>
>>
>> Does this mean that if I want to support document version feature, I can not compact the database ?
>>
>
> Applications should not depend on controlling compaction schedules.
> Only the most recent version is replicated, so if you run in a cluster
> it's like it's compacting all the time.
>
>> Let's say I am working on a Word document, I upload it as an attachment; I need to have a list of all versions for this file and can download any of previous version.
>>
>> Is this how I should model it in CouchDB:
>>  - have a version field, not using the CouchDB _rev. update the version in my application.
>>  - when uploading the modified Word document, a doc with a different ID is created. So compacting a DB will not affect them.
>>    but how to get a list of all versions of this document ? I need to use another field to identify them, for example, another application generated ID.
>>
>> Is this the general practice ? I am sure supporting versioning is a common request, so appreciate if you can provide some advices..
>> thanks,
>> canal

That approach seems reasonable.  In any case content versioning is an
application level concern.  CouchDB's multiple revision feature is
used to detect update conflicts and to prevent race conditions.

From CouchDB: The Definitive Guide
<http://books.couchdb.org/relax/example-app/storing-documents>:

> We touched on this in the Eventual Consistency chapter. The revision id acts as a
> gatekeeper for writes to a document in CouchDB’s MVCC system. A document is a shared
> resource, many clients can read and write them at the same time. To make sure two
> writing clients don’t step on each others feet, each client must provide what it believes is
> the latest revision id of a document along with the proposed changes. If the on-disk
> revision id matches the provided _rev, CouchDB will accept the change. If it doesn’t, the
> update will be rejected. The client should read the latest version, integrate his changes and
> try saving again.

From the CouchDB Wiki <http://wiki.apache.org/couchdb/HTTP_view_API>:

> The include_docs option will include the associated document. Although, the user should
> keep in mind that there is a race condition when using this option. It is possible that
> between reading the view data and fetching the corresponding document that the document
> has changed. If you want to alleviate such concerns you should emit an object with a _rev
> attribute as in emit(key, {"_rev": doc._rev}). This alleviates the race condition but leaves
> the possiblity that the returned document has been deleted (in which case, it includes the
> "_deleted": true attribute).

> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>



      

Re: couchdb: suitable for this type of applications ?

Posted by Jesse Hallett <ha...@gmail.com>.
On Sat, Sep 26, 2009 at 12:22 AM, Chris Anderson <jc...@apache.org> wrote:
> On Fri, Sep 25, 2009 at 8:34 PM, go canal <go...@yahoo.com> wrote:
>>>>When you compact the database it removes all but the most recent version of each document.
>>
>>
>> Does this mean that if I want to support document version feature, I can not compact the database ?
>>
>
> Applications should not depend on controlling compaction schedules.
> Only the most recent version is replicated, so if you run in a cluster
> it's like it's compacting all the time.
>
>> Let's say I am working on a Word document, I upload it as an attachment; I need to have a list of all versions for this file and can download any of previous version.
>>
>> Is this how I should model it in CouchDB:
>>  - have a version field, not using the CouchDB _rev. update the version in my application.
>>  - when uploading the modified Word document, a doc with a different ID is created. So compacting a DB will not affect them.
>>    but how to get a list of all versions of this document ? I need to use another field to identify them, for example, another application generated ID.
>>
>> Is this the general practice ? I am sure supporting versioning is a common request, so appreciate if you can provide some advices..
>> thanks,
>> canal

That approach seems reasonable.  In any case content versioning is an
application level concern.  CouchDB's multiple revision feature is
used to detect update conflicts and to prevent race conditions.

>From CouchDB: The Definitive Guide
<http://books.couchdb.org/relax/example-app/storing-documents>:

> We touched on this in the Eventual Consistency chapter. The revision id acts as a
> gatekeeper for writes to a document in CouchDB’s MVCC system. A document is a shared
> resource, many clients can read and write them at the same time. To make sure two
> writing clients don’t step on each others feet, each client must provide what it believes is
> the latest revision id of a document along with the proposed changes. If the on-disk
> revision id matches the provided _rev, CouchDB will accept the change. If it doesn’t, the
> update will be rejected. The client should read the latest version, integrate his changes and
> try saving again.

>From the CouchDB Wiki <http://wiki.apache.org/couchdb/HTTP_view_API>:

> The include_docs option will include the associated document. Although, the user should
> keep in mind that there is a race condition when using this option. It is possible that
> between reading the view data and fetching the corresponding document that the document
> has changed. If you want to alleviate such concerns you should emit an object with a _rev
> attribute as in emit(key, {"_rev": doc._rev}). This alleviates the race condition but leaves
> the possiblity that the returned document has been deleted (in which case, it includes the
> "_deleted": true attribute).

> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Re: couchdb: suitable for this type of applications ?

Posted by Chris Anderson <jc...@apache.org>.
On Fri, Sep 25, 2009 at 8:34 PM, go canal <go...@yahoo.com> wrote:
>>>When you compact the database it removes all but the most recent version of each document.
>
>
> Does this mean that if I want to support document version feature, I can not compact the database ?
>

Applications should not depend on controlling compaction schedules.
Only the most recent version is replicated, so if you run in a cluster
it's like it's compacting all the time.

> Let's say I am working on a Word document, I upload it as an attachment; I need to have a list of all versions for this file and can download any of previous version.
>
> Is this how I should model it in CouchDB:
>  - have a version field, not using the CouchDB _rev. update the version in my application.
>  - when uploading the modified Word document, a doc with a different ID is created. So compacting a DB will not affect them.
>    but how to get a list of all versions of this document ? I need to use another field to identify them, for example, another application generated ID.
>
> Is this the general practice ? I am sure supporting versioning is a common request, so appreciate if you can provide some advices..
> thanks,
> canal
>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: couchdb: suitable for this type of applications ?

Posted by go canal <go...@yahoo.com>.
>>When you compact the database it removes all but the most recent version of each document. 


Does this mean that if I want to support document version feature, I can not compact the database ?

Let's say I am working on a Word document, I upload it as an attachment; I need to have a list of all versions for this file and can download any of previous version. 

Is this how I should model it in CouchDB:
 - have a version field, not using the CouchDB _rev. update the version in my application.
 - when uploading the modified Word document, a doc with a different ID is created. So compacting a DB will not affect them.
    but how to get a list of all versions of this document ? I need to use another field to identify them, for example, another application generated ID.

Is this the general practice ? I am sure supporting versioning is a common request, so appreciate if you can provide some advices..
thanks,
canal


      

Re: couchdb: suitable for this type of applications ?

Posted by Jesse Hallett <ha...@gmail.com>.
I am not sure how this applies to attachments; but with basic docs whenever
you update CouchDB does store a new version.  When you compact the database
it removes all but the most recent version of each document.  So compact the
database once in a while and your disk usage should be reasonable.

You can update whatever document properties you like and attachments will
attach to the new version of the document.  The attachment is not copied - I
believe attachments are stored separately and each document version gets a
reference to the attachment file.

I don't know enough off-hand to address the frequent small updates question.

On Sep 25, 2009 6:29 PM, "go canal" <go...@yahoo.com> wrote:

Hello,
another question. here is the use case:
 - a group of 10 engineers working on a project,
 - total files created over 12 months: 500
 - average update per file 15 times
 - average file size 20MB
 - file format: MS Office, PDF, CAD drawings.

I thought CouchDB is designed to support this type of applications but is it
correct to say that, every time there is an update to a doc, CouchDB will
create a new version of the (whole) document ?
So can I say the total storage will be 500 * 20MB * 15 = 150GB without
counting other overhead ?


Another question, if I only modify one field, is the attachment also copied
into the new version of the document ?

I also saw this message posted almost a year ago, talking about if CouchDB
is suitable for frequent small writes application, and it seems that some
client side buffering is needed :
http://markmail.org/message/klrbkh36ivxg46ax

Is it still true today ?

rgds,
canal