You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by Apache Wiki <wi...@apache.org> on 2011/12/23 12:22:49 UTC

[Couchdb Wiki] Update of "FUQ" by MarcelloNuccio

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "FUQ" page has been changed by MarcelloNuccio:
http://wiki.apache.org/couchdb/FUQ?action=diff&rev1=6&rev2=7

Comment:
Added note on deleted documents

  <<Include(EditTheWiki)>>
  
  = Frequently Unasked Questions =
- 
  On IRC and the Mailing List, these are the Questions People should have asked to help them stay Relaxed.
  
  == Documents ==
+  1. Why should I generate my own UUIDs?
+   . While CouchDB will generate a unique identifier for the _id field of any doc that you create, there are three reasons why you are in most cases better off generating them yourself.
  
-  1. Why should I generate my own UUIDs?
-     While CouchDB will generate a unique identifier for the _id field of any doc that you create, there are three reasons why you are in most cases better off generating them yourself.
+   * If for any reason you miss the 200 OK reply from CouchDB, and storing the document is attempted again, you would end up with the same document content stored under duplicate _ids. This could easily happen with intermediary proxies and cache systems that may not inform developers that the failed transaction is being retried.
+   * _ids are are the only unique enforced value within CouchDB so you might as well make use of this.
  
+   * CouchDB stores its documents in a B+ tree. Each additional or updated document is stored as a leaf node, and may require re-writing intermediary and parent nodes. You may be able to take advantage of sequencing your own ids more effectively than the automatically generated ids if you can arrange them to be sequential yourself.
-    * If for any reason you miss the 200 OK reply from CouchDB, and storing the document is attempted again, you would end up with the same document content stored under duplicate _ids. This could easily happen with intermediary proxies and cache systems that may not inform developers that the failed transaction is being retried.
-    * _ids are are the only unique enforced value within CouchDB so you might as well make use of this.
  
-    * CouchDB stores its documents in a B+ tree. Each additional or updated document is stored as a leaf node, and may require re-writing intermediary and parent nodes. You may be able to take advantage of sequencing your own ids more effectively than the automatically generated ids if you can arrange them to be sequential yourself.
  
   1. What is the benefit of using the _bulk_docs API instead of PUTting single documents to CouchDB?
+   . Aside from the HTTP overhead and roundtrip you are saving, the main advantage is that CouchDB can handle the B tree updates more efficiently, decreasing rewriting of intermediary and parent nodes, both improving speed and saving disk space.
  
-     Aside from the HTTP overhead and roundtrip you are saving, the main advantage is that CouchDB can handle the B tree updates more efficiently, decreasing rewriting of intermediary and parent nodes, both improving speed and saving disk space.
  
   1. Why can't I use MVCC in CouchDB as a revision control system for my docs?
  
+  1. Does compaction remove deleted documents’ contents?
+   . We keep the latest revision of every document ever seen, even if that revision has '"_deleted":true' in it. This is so that replication can ensure eventual consistency between replicas. Not only will all replicas agree on which documents are present and which are not, but also the contents of both.
+ 
+   . Deleted documents specifically allow for a body to be set in the deleted revision. The intention for this is to have a "who deleted this" type of meta data for the doc. Some client libraries delete docs by grabbing the current object blob, adding a '"_deleted":true' member, and then sending it back which inadvertently (in most cases) keeps the last doc body around after compaction.
+ 
  == Replication ==
- 
   1. What is the difference between PULL and PUSH replication?
   1. Why do I need to permit deleted docs in validation functions?
   1. How do compaction and purging impact replication?
  
  == Views ==
+  1.
+  In a view, why should I not {{{emit(key,doc)}}} ?
  
-  1. In a view, why should I not {{{emit(key,doc)}}} ?
+   .
    The key point here is that by emitting {{{,doc}}} you are duplicating the document which is already present in the database (a .couch file), and including it in the results of the view (a different .couch file, with similar structure). This is the same as having a SQL Index that includes the original table, instead of using a foreign key.
  
    The same effect can be acheived by using {{{emit(key,null)}}} and ?include_docs=true with the view request. This approach has the benefit of not duplicating the document data in the view index, which reduces the disk space consumed by the view. On the other hand, the file access pattern is slightly more expensive for CouchDB. It is usually a premature optimization to include the document in the view. As always, if you think you may need to emit the document it's always best to test.
+ 
+ 
  
   1. What happens if I don't ducktype the variables I am using in my view?
   1. Does it matter if my map function is complex, or takes a long time to run?
  
  == Tools ==
+  1.
+  I decided to roll my own !CouchApp tool or CouchDB client in <myfavouritelanguage>. How cool is that?
  
-  1. I decided to roll my own !CouchApp tool or CouchDB client in <myfavouritelanguage>. How cool is that?
-    Pretty cool! In fact its a great way to get familiar with the API. However - wrappers around the HTTP API are not necessarily of great use as CouchDB already makes this very easy. Mapping CouchDB semantics onto your language's native data structures is much more useful to people. Many languages are already covered and we'd really like to see your ideas and enhancements incorporated into the existing tools if possible, and helping to keep them up to date. Ask on the mailing list about contributing!
+   . Pretty cool! In fact its a great way to get familiar with the API. However - wrappers around the HTTP API are not necessarily of great use as CouchDB already makes this very easy. Mapping CouchDB semantics onto your language's native data structures is much more useful to people. Many languages are already covered and we'd really like to see your ideas and enhancements incorporated into the existing tools if possible, and helping to keep them up to date. Ask on the mailing list about contributing!
+ 
  
  == Log Files ==
   1. Those Erlang messages in the log are pretty confusing. What gives?
-    While the Erlang messages in the log can be confusing to someone unfamiliar with Erlang, with practice they become very helpful. The CouchDB developers do try to catch and log messages that might be useful to a system administrator in a friendly format, but occassionally a bug or otherwise unexpected behavior manifests itself in more verbose dumps of Erlang server state. These messages can be very useful to CouchDB developers. If you find many confusing messages in your log, feel free to inquire about them. If they are expected, devs can work to ensure that the message is more cleanly formatted. Otherwise, the messages may indicate a bug in the code.
+   . While the Erlang messages in the log can be confusing to someone unfamiliar with Erlang, with practice they become very helpful. The CouchDB developers do try to catch and log messages that might be useful to a system administrator in a friendly format, but occassionally a bug or otherwise unexpected behavior manifests itself in more verbose dumps of Erlang server state. These messages can be very useful to CouchDB developers. If you find many confusing messages in your log, feel free to inquire about them. If they are expected, devs can work to ensure that the message is more cleanly formatted. Otherwise, the messages may indicate a bug in the code.
-    In many cases, this is enough to identify the problem. For example, OS errors are reported as tagged tuples {{{{error,enospc}}}} or {{{{error,enoacces}}}} which respectively is "You ran out of disk space", and "CouchDB doesn't have permission to access that resource". Most of these errors are derived from C used to build the Erlang VM and are documented in {{{errno.h}}} and related header files. [[http://www.ibm.com/developerworks/aix/library/au-errnovariable/|IBM]] provides a good introduction to these, and the relevant [[http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html|POSIX]] and [[http://www.gnu.org/s/hello/manual/libc/Error-Codes.html|GNU]] and [[http://msdn.microsoft.com/en-us/library/5814770t.aspx|Microsoft Windows]] standards will cover most cases.
+   In many cases, this is enough to identify the problem. For example, OS errors are reported as tagged tuples {{{{error,enospc}}}} or {{{{error,enoacces}}}} which respectively is "You ran out of disk space", and "CouchDB doesn't have permission to access that resource". Most of these errors are derived from C used to build the Erlang VM and are documented in {{{errno.h}}} and related header files. [[http://www.ibm.com/developerworks/aix/library/au-errnovariable/|IBM]] provides a good introduction to these, and the relevant [[http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html|POSIX]] and [[http://www.gnu.org/s/hello/manual/libc/Error-Codes.html|GNU]] and [[http://msdn.microsoft.com/en-us/library/5814770t.aspx|Microsoft Windows]] standards will cover most cases.