You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Sebastien PASTOR <se...@gmx.com> on 2010/05/13 18:01:36 UTC

doc creation consistency ?

Hello, 

This might be a silly question but this is just bugging me right now...
so here it is : what is happening if a PUT request does not get its
response? Imagine we have a network issue just AFTER couchDB has
created a doc but BEFORE it could reply back to the client. Does couchDB
detect that and kinda delete the created doc  or the doc is definitively
created ? In that case when the client is timing out it believes the
doc creation  has failed  whereas it has not.

Sounds tricky but those things could happen right ?


Thanks

Sebastien.

Re: doc creation consistency ?

Posted by J Chris Anderson <jc...@gmail.com>.

On May 13, 2010, at 9:01 AM, Sebastien PASTOR wrote:

> Hello, 
> 
> This might be a silly question but this is just bugging me right now...
> so here it is : what is happening if a PUT request does not get its
> response? Imagine we have a network issue just AFTER couchDB has
> created a doc but BEFORE it could reply back to the client. Does couchDB
> detect that and kinda delete the created doc  or the doc is definitively
> created ? In that case when the client is timing out it believes the
> doc creation  has failed  whereas it has not.
> 
> Sounds tricky but those things could happen right ?
> 

PUT is idempotent, so the client should just retry (usually this happens without the eg xhr layer even noticing.)

I don't off the top of my head recall what the HTTP response to the 2nd PUT will be.

The most reliable way for the client to see when updates are committed is to watch the _changes feed.

Chris

> 
> Thanks
> 
> Sebastien.
>

Re: arguments for couchdb ?!

Posted by Dave Cottlehuber <da...@muse.net.nz>.

On 14 May 2010 08:37, Alexander Uvarov <al...@gmail.com> wrote:
>
> On 14.05.2010, at 2:16, c.Kleinhuis wrote:
>
>>
>> what about server configuration ? compared to mysql/hibernate/spring/java ?
>> as far as i can tell for windows, it worked nicely with the installer, any known problems
>> when setting up a server farm, or a single system ? ! i mean, how hard would it be for
>> an advanced admin to set up a system with couchdb ?
>>
>
> It depends. But this kind of question should answer more experienced member. First of all, I don't see any reasons to run couchdb on windows in production. It's pretty simple If you are going to set up N nodes replicating to each other and run nginx on top of them. Packages are available for Linux (both couch and erlang), also it's dead simple to build couchdb from scratch.
>
>

IIRC the main issue with windows is no support for online compaction
at the moment due to some different file APIs between windows & linux.
Is there anybody running large farms of couch on windows?

dch

Re: large attachments/huge databases ?`

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

 
>> i need ALL versions :D from the beginning to current version, what about saving
>> previous versions as an array field containing everything but the array field for saving the versions ?
>>
>>     
>
> One nice thing about attachments is that history doesn't bloat the view server memory footprint (attachments aren't available in views).
>
> It also takes history out of the application space, so you can pretty much "add" history to normal applications with the attachment history method.
>
> Chris 
problematic ... i think i wont use it this way, i rather have to include 
whole attachments in old revisions,
i think i use a parent node, containing an array of task nodes....

thx

Re: arguments for couchdb - binary data output ?

Posted by Sebastian Cohnen <se...@googlemail.com>.

please start new threads when posting to the ML

but back to topic: I'm not sure what you want to know/hear. In terms of efficiency, sure, binary is much more "transport"-efficient then json/text. But json on the other hand is a perfect fit for web-centric application (json is a subset of javascript e.g.)...

On 14.05.2010, at 11:11, c.Kleinhuis wrote:

> Hello,
> 
> one more question is, how is json output comparable to direct binary output, in my eyes
> the json is the compactest human readable transfer format, and the difference json<->binary
> is far less than compared to xml<->binary
> 
> what do you think ?
> 
> 
> thx
> ck
>

arguments for couchdb - binary data output ?

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

 Hello,

one more question is, how is json output comparable to direct binary 
output, in my eyes
the json is the compactest human readable transfer format, and the 
difference json<->binary
is far less than compared to xml<->binary

what do you think ?


thx
ck

Re: arguments for couchdb ?!

Posted by Alexander Uvarov <al...@gmail.com>.

On 14.05.2010, at 2:16, c.Kleinhuis wrote:

> Alexander Uvarov schrieb:
>> 
>> Good:
>> 
>> 1. Schema-less is awesome for a lot of applications
>> 2. RESTful JSON store
>> 3. Document attachments
>> 4. Views are fast
>> 5. Full text search "out of the box" with couchdb-lucene
>> 
>> Bad:
>> 
>> 1. Does not support transactions and AFAIK there are no plans to directly support transactional bulk document API
>>  
> this is handled via eventual consistency as far as i know, and will never be adressed ;)
> with the benefit of fast responses

Too brave. Things are not as simple as they seem.

> 
>> 2. It's impossible to just say "update/delete all people where some field LIKE something", but actually not a problem
> ok
> 
> what about server configuration ? compared to mysql/hibernate/spring/java ?
> as far as i can tell for windows, it worked nicely with the installer, any known problems
> when setting up a server farm, or a single system ? ! i mean, how hard would it be for
> an advanced admin to set up a system with couchdb ?
> 

It depends. But this kind of question should answer more experienced member. First of all, I don't see any reasons to run couchdb on windows in production. It's pretty simple If you are going to set up N nodes replicating to each other and run nginx on top of them. Packages are available for Linux (both couch and erlang), also it's dead simple to build couchdb from scratch.

Re: arguments for couchdb ?!

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

Alexander Uvarov schrieb:
> On 14.05.2010, at 1:44, c.Kleinhuis wrote:
>
>   
>> hi there,
>>
>> is there a comparison between hibernate/spring and couchdb ?
>>
>> to convience my project manager
>>
>> -in my eyes the main benefit of couchdb is that it removes a complete application tier,
>> the java/server side, which maps objects to tables. the relaxing effect :D
>>
>> - here is a performance comparison:
>>
>> http://metalelf0dev.blogspot.com/2008/09/mysql-couchdb-performance-comparison.html
>> it is rather old (2008), and just displays that indexing is a bit slower, which shouldnt
>> be a problem 
>>
>> if there are any more arguments, please let me know, also i would like
>> to have some examples of HUGE Sites using couchdb, 
>> thx
>> ck
>>
>>     
>
> Good:
>
> 1. Schema-less is awesome for a lot of applications
> 2. RESTful JSON store
> 3. Document attachments
> 4. Views are fast
> 5. Full text search "out of the box" with couchdb-lucene
>
> Bad:
>
> 1. Does not support transactions and AFAIK there are no plans to directly support transactional bulk document API
>   
this is handled via eventual consistency as far as i know, and will 
never be adressed ;)
with the benefit of fast responses

> 2. It's impossible to just say "update/delete all people where some field LIKE something", but actually not a problem
ok

what about server configuration ? compared to mysql/hibernate/spring/java ?
as far as i can tell for windows, it worked nicely with the installer, 
any known problems
when setting up a server farm, or a single system ? ! i mean, how hard 
would it be for
an advanced admin to set up a system with couchdb ?

Re: arguments for couchdb ?!

Posted by Alexander Uvarov <al...@gmail.com>.

On 14.05.2010, at 1:44, c.Kleinhuis wrote:

> hi there,
> 
> is there a comparison between hibernate/spring and couchdb ?
> 
> to convience my project manager
> 
> -in my eyes the main benefit of couchdb is that it removes a complete application tier,
> the java/server side, which maps objects to tables. the relaxing effect :D
> 
> - here is a performance comparison:
> 
> http://metalelf0dev.blogspot.com/2008/09/mysql-couchdb-performance-comparison.html
> it is rather old (2008), and just displays that indexing is a bit slower, which shouldnt
> be a problem 
> 
> if there are any more arguments, please let me know, also i would like
> to have some examples of HUGE Sites using couchdb, 
> thx
> ck
> 

Good:

1. Schema-less is awesome for a lot of applications
2. RESTful JSON store
3. Document attachments
4. Views are fast
5. Full text search "out of the box" with couchdb-lucene

Bad:

1. Does not support transactions and AFAIK there are no plans to directly support transactional bulk document API
2. It's impossible to just say "update/delete all people where some field LIKE something", but actually not a problem

arguments for couchdb ?!

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

hi there,

is there a comparison between hibernate/spring and couchdb ?

to convience my project manager

-in my eyes the main benefit of couchdb is that it removes a complete 
application tier,
the java/server side, which maps objects to tables. the relaxing effect :D

- here is a performance comparison:

http://metalelf0dev.blogspot.com/2008/09/mysql-couchdb-performance-comparison.html
it is rather old (2008), and just displays that indexing is a bit slower, which shouldnt
be a problem 


if there are any more arguments, please let me know, also i would like
to have some examples of HUGE Sites using couchdb, 

thx
ck

Re: large attachments/huge databases ?`

Posted by J Chris Anderson <jc...@gmail.com>.

On May 13, 2010, at 12:12 PM, c.Kleinhuis wrote:

> J Chris Anderson schrieb:
>> On May 13, 2010, at 11:35 AM, Sebastian Cohnen wrote:
>> 
>>  
>>> when you need versioning, you need to implement it explicitly.
>>>    
>> 
>> 
>> The simplest versioning scheme is for the client to store the string representation of a document as served by CouchDB. While updating the document contents, the original string representation is sent back as a new attachment.
>> 
>> This has the advantage that versions will replicate together, and they can be manually pruned by deleting individual attachments.
>> 
>> This should be a trivial addition to jquery.couch.js or couch.js, if anyone's up for hacking.
>>  
> i need ALL versions :D from the beginning to current version, what about saving
> previous versions as an array field containing everything but the array field for saving the versions ?
> 

One nice thing about attachments is that history doesn't bloat the view server memory footprint (attachments aren't available in views).

It also takes history out of the application space, so you can pretty much "add" history to normal applications with the attachment history method.

Chris

> thx
>

Re: large attachments/huge databases ?`

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

J Chris Anderson schrieb:
> On May 13, 2010, at 11:35 AM, Sebastian Cohnen wrote:
>
>   
>> when you need versioning, you need to implement it explicitly.
>>     
>
>
> The simplest versioning scheme is for the client to store the string representation of a document as served by CouchDB. While updating the document contents, the original string representation is sent back as a new attachment.
>
> This has the advantage that versions will replicate together, and they can be manually pruned by deleting individual attachments.
>
> This should be a trivial addition to jquery.couch.js or couch.js, if anyone's up for hacking.
>   
 i need ALL versions :D from the beginning to current version, what 
about saving
previous versions as an array field containing everything but the array 
field for saving the versions ?

thx

Re: large attachments/huge databases ?`

Posted by J Chris Anderson <jc...@gmail.com>.

On May 13, 2010, at 11:35 AM, Sebastian Cohnen wrote:

> 
> when you need versioning, you need to implement it explicitly.

The simplest versioning scheme is for the client to store the string representation of a document as served by CouchDB. While updating the document contents, the original string representation is sent back as a new attachment.

This has the advantage that versions will replicate together, and they can be manually pruned by deleting individual attachments.

This should be a trivial addition to jquery.couch.js or couch.js, if anyone's up for hacking.

Chris

Re: large attachments/huge databases ?`

Posted by Sebastian Cohnen <se...@googlemail.com>.

>>> -another point is general performance of about e.g. 200.000 documents in a single
>>> database ... how is disk usage when maintaining versioning of each document ?
>>> -can the versioning be deactivated or deleted ?!
>>>    
>> 
>> Again, there is no "versioning" of documents - at least not what most people expect from document versioning. There are only two reasons why couchdb keep "versions" of documents: MVCC and the append-only principle.
>> 
>> You need to compact your database on a regular bases (depending on your updates to documents) and no, there is no way to completely disable "versioning" (and it wouldn't make any sense to do that). The append-only approach leads automatically to higher disk usage compared to in-place updates, thus you need to cleanup your database (run compaction) but you win robustness and there is no fixup phase needed in case of hardware failure e.g.
>> 
>>  
> in fact i would like to use the versioning exthausive, because each version of a task status should be
> accessible, can i access older versions of a document easily ?!
> 
> the versioning feature is quite cool, because i would have to implement it using hibernate/spring
> otherwise ...

there is no document versioning in the conventional sense! old revisions are only kept for conflict resolution and duo to the append-only approach. you should never rely on old versions to be around. e.g. when you compact your database, old revisions are removed.

when you need versioning, you need to implement it explicitly.

Re: large attachments/huge databases ?`

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

>> -he read that indexing is significantly higher than e.g. mysql -
>> my answer was that indexing is not affecting performance because it is a
>> one time action ....
>>     
>
> Right, once indices are generated, they are updated incrementally and very fast on access.
>
>   
ok, sounds good

>> -another point is general performance of about e.g. 200.000 documents in a single
>> database ... how is disk usage when maintaining versioning of each document ?
>> -can the versioning be deactivated or deleted ?!
>>     
>
> Again, there is no "versioning" of documents - at least not what most people expect from document versioning. There are only two reasons why couchdb keep "versions" of documents: MVCC and the append-only principle.
>
> You need to compact your database on a regular bases (depending on your updates to documents) and no, there is no way to completely disable "versioning" (and it wouldn't make any sense to do that). The append-only approach leads automatically to higher disk usage compared to in-place updates, thus you need to cleanup your database (run compaction) but you win robustness and there is no fixup phase needed in case of hardware failure e.g.
>
>   
in fact i would like to use the versioning exthausive, because each 
version of a task status should be
accessible, can i access older versions of a document easily ?!

the versioning feature is quite cool, because i would have to implement 
it using hibernate/spring
otherwise ...

> you're welcome :)

thx
ck

Re: large attachments/huge databases ?`

Posted by Sebastian Cohnen <se...@googlemail.com>.

Hey,

On 13.05.2010, at 18:35, c.Kleinhuis wrote:

> i need to convience my project manager ;)
> 
> -he read that indexing is significantly higher than e.g. mysql -
> my answer was that indexing is not affecting performance because it is a
> one time action ....

Right, once indices are generated, they are updated incrementally and very fast on access.

> -another point is general performance of about e.g. 200.000 documents in a single
> database ... how is disk usage when maintaining versioning of each document ?
> -can the versioning be deactivated or deleted ?!

Again, there is no "versioning" of documents - at least not what most people expect from document versioning. There are only two reasons why couchdb keep "versions" of documents: MVCC and the append-only principle.

You need to compact your database on a regular bases (depending on your updates to documents) and no, there is no way to completely disable "versioning" (and it wouldn't make any sense to do that). The append-only approach leads automatically to higher disk usage compared to in-place updates, thus you need to cleanup your database (run compaction) but you win robustness and there is no fixup phase needed in case of hardware failure e.g.

> and finally it is about handling huge movie files ( production 720p files ) which
> must be handled somehow, what happens if an upload fails, or should a proxy like
> php be used to receive those files, and just store a reference in the couchdb ?

You can upload files of any size directly to couchdb. I personally would store bigger files externally, because compaction is going to be a pain, when you have much updates to documents. I would say it depends on your use case and maybe you have to build some throwaway-prototypes and play with the different approaches.

> thx in advance
> ck

you're welcome :)

Re: large attachments/huge databases ?`

Posted by Elf <el...@gmail.com>.

My 5 cents too.
Rsync is much more efficient than any ftp client.

On May 14, 2010 12:10 PM, "c.Kleinhuis" <ck...@digitalgott.de> wrote:

hi, thank you for your description, and the project is only storing metadata
:D
the files will be updated via ftp syncronize :D

i will post another question right now ...

> > CK, > > My $0.02 on the storage of the videos is not to use CouchDB for
that.  Use Couch to sto...

Re: large attachments/huge databases ?`

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

hi, thank you for your description, and the project is only storing 
metadata :D
the files will be updated via ftp syncronize :D

i will post another question right now ...
> CK,
>
> My $0.02 on the storage of the videos is not to use CouchDB for that.  Use Couch to store metadata on the files.  Stuff like filesystem path, server holding the file, date, time-stamp, video info, etc.  The actual files are better stored in a filesystem somewhere else.  It's kind of the idea of Facebook's Haystack project (http://www.facebook.com/note.php?note_id=76191543919).  That's Facebook's system to store photos.
>
> The problem with storing large attachments in any database is you're creating a filesystem on a filesystem. You ask Couch for a file, Couch has to go the filesystem to get the files, and then you go back and forth between Couch and the filesystem getting information that then gets sent back to you.  Removing the middle-man, Couch in this case, should give you better performance and a smaller DB.  As Couch, "where should I find the file" and the your application just goes out and gets the file directly.
>
> There used to be a great article explaining why putting large files in a database (the article was about MySQL) was a bad idea but I can't seen to find it anymore.
>
> Anyway, good luck!
>
> -Cesar  
>
>
>
> ________________________________
> From: c.Kleinhuis <ck...@digitalgott.de>
> To: user@couchdb.apache.org
> Sent: Thu, May 13, 2010 9:35:24 AM
> Subject: large attachments/huge databases ?`
>
> i need to convience my project manager ;)
>
> -he read that indexing is significantly higher than e.g. mysql -
> my answer was that indexing is not affecting performance because it is a
> one time action ....
>
> -another point is general performance of about e.g. 200.000 documents in a single
> database ... how is disk usage when maintaining versioning of each document ?
> -can the versioning be deactivated or deleted ?!
>
> and finally it is about handling huge movie files ( production 720p files ) which
> must be handled somehow, what happens if an upload fails, or should a proxy like
> php be used to receive those files, and just store a reference in the couchdb ?
>
> thx in advance
> ck
>

Re: large attachments/huge databases ?`

Posted by James Marca <jm...@translab.its.uci.edu>.

On Thu, May 13, 2010 at 09:10:29PM +0100, Randall Leeds wrote:
> If you want to be super couch-y and keep it HTTP based and keep couch
> at the top of your server stack you could write your own http handler
> that just streams files off the disk.
> 
> http://mycouch:5984/_files/path_to_file
> 
> Look at the other handlers in the couch.ini files and then look at the
> corresponding .erl files in src/couchdb to get a feel for how to plug
> your own in.
> 
> This might help you too:
> http://vmx.cx/couchdb/tutorial/indexer.html
> 
> -Randall
That's a  verycool idea.

James

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: large attachments/huge databases ?`

Posted by Randall Leeds <ra...@gmail.com>.

If you want to be super couch-y and keep it HTTP based and keep couch
at the top of your server stack you could write your own http handler
that just streams files off the disk.

http://mycouch:5984/_files/path_to_file

Look at the other handlers in the couch.ini files and then look at the
corresponding .erl files in src/couchdb to get a feel for how to plug
your own in.

This might help you too:
http://vmx.cx/couchdb/tutorial/indexer.html

-Randall

On Thu, May 13, 2010 at 20:52, Cesar Delgado <be...@ymail.com> wrote:
> CK,
>
> My $0.02 on the storage of the videos is not to use CouchDB for that.  Use Couch to store metadata on the files.  Stuff like filesystem path, server holding the file, date, time-stamp, video info, etc.  The actual files are better stored in a filesystem somewhere else.  It's kind of the idea of Facebook's Haystack project (http://www.facebook.com/note.php?note_id=76191543919).  That's Facebook's system to store photos.
>
> The problem with storing large attachments in any database is you're creating a filesystem on a filesystem. You ask Couch for a file, Couch has to go the filesystem to get the files, and then you go back and forth between Couch and the filesystem getting information that then gets sent back to you.  Removing the middle-man, Couch in this case, should give you better performance and a smaller DB.  As Couch, "where should I find the file" and the your application just goes out and gets the file directly.
>
> There used to be a great article explaining why putting large files in a database (the article was about MySQL) was a bad idea but I can't seen to find it anymore.
>
> Anyway, good luck!
>
> -Cesar
>
>
>
> ________________________________
> From: c.Kleinhuis <ck...@digitalgott.de>
> To: user@couchdb.apache.org
> Sent: Thu, May 13, 2010 9:35:24 AM
> Subject: large attachments/huge databases ?`
>
> i need to convience my project manager ;)
>
> -he read that indexing is significantly higher than e.g. mysql -
> my answer was that indexing is not affecting performance because it is a
> one time action ....
>
> -another point is general performance of about e.g. 200.000 documents in a single
> database ... how is disk usage when maintaining versioning of each document ?
> -can the versioning be deactivated or deleted ?!
>
> and finally it is about handling huge movie files ( production 720p files ) which
> must be handled somehow, what happens if an upload fails, or should a proxy like
> php be used to receive those files, and just store a reference in the couchdb ?
>
> thx in advance
> ck

Re: large attachments/huge databases ?`

Posted by Cesar Delgado <be...@ymail.com>.

CK,

My $0.02 on the storage of the videos is not to use CouchDB for that.  Use Couch to store metadata on the files.  Stuff like filesystem path, server holding the file, date, time-stamp, video info, etc.  The actual files are better stored in a filesystem somewhere else.  It's kind of the idea of Facebook's Haystack project (http://www.facebook.com/note.php?note_id=76191543919).  That's Facebook's system to store photos.

The problem with storing large attachments in any database is you're creating a filesystem on a filesystem. You ask Couch for a file, Couch has to go the filesystem to get the files, and then you go back and forth between Couch and the filesystem getting information that then gets sent back to you.  Removing the middle-man, Couch in this case, should give you better performance and a smaller DB.  As Couch, "where should I find the file" and the your application just goes out and gets the file directly.

There used to be a great article explaining why putting large files in a database (the article was about MySQL) was a bad idea but I can't seen to find it anymore.

Anyway, good luck!

-Cesar  



________________________________
From: c.Kleinhuis <ck...@digitalgott.de>
To: user@couchdb.apache.org
Sent: Thu, May 13, 2010 9:35:24 AM
Subject: large attachments/huge databases ?`

i need to convience my project manager ;)

-he read that indexing is significantly higher than e.g. mysql -
my answer was that indexing is not affecting performance because it is a
one time action ....

-another point is general performance of about e.g. 200.000 documents in a single
database ... how is disk usage when maintaining versioning of each document ?
-can the versioning be deactivated or deleted ?!

and finally it is about handling huge movie files ( production 720p files ) which
must be handled somehow, what happens if an upload fails, or should a proxy like
php be used to receive those files, and just store a reference in the couchdb ?

thx in advance
ck

large attachments/huge databases ?`

Posted by "c.Kleinhuis" <ck...@digitalgott.de>.

i need to convience my project manager ;)

-he read that indexing is significantly higher than e.g. mysql -
my answer was that indexing is not affecting performance because it is a
one time action ....

-another point is general performance of about e.g. 200.000 documents in 
a single
database ... how is disk usage when maintaining versioning of each 
document ?
-can the versioning be deactivated or deleted ?!

and finally it is about handling huge movie files ( production 720p 
files ) which
must be handled somehow, what happens if an upload fails, or should a 
proxy like
php be used to receive those files, and just store a reference in the 
couchdb ?

thx in advance
ck