You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Brian Candler <B....@pobox.com> on 2009/01/25 13:09:50 UTC

Some CouchDB Qs & As

As a newbie, I had some questions which I still had after reading the wiki
and the book so far, so I thought I'd do some experiments to find the
answers.

I'm just posting them here in case it's useful to anyone else, or it could
provide some ideas when writing the book.

(1a) Does CouchDB store the raw JSON which it receives, character by
character, or does it convert to and from an internal representation?
(1b) Does CouchDB accept valid Javascript which is not valid as JSON, e.g.
{foo:"bar"} or {foo:/bar/} ?

Let's test 1b first:

$ cat test.dat
{"foo":/bar/}
$ curl -T test.dat http://127.0.0.1:5984/test_suite_db/test1
{"error":"case_clause","reason":"{\"foo\":/bar/}\n"}

So the answer is "no", only strict JSON is allowed. Now to try 1a:

$ cat test.dat
{"foo":
       "bar"}
$ curl -T test.dat http://127.0.0.1:5984/test_suite_db/test1
{"ok":true,"id":"test1","rev":"1878284436"}
$ curl http://127.0.0.1:5984/test_suite_db/test1
{"_id":"test1","_rev":"1878284436","foo":"bar"}

This suggests that the JSON is converted into some internal representation,
and then converted back to JSON.

(2a) When PUTting an identical version of a document, does CouchDB still
allocate a new _rev?
(2b) What about when PUTting a document which is semantically identical
JSON, but differs in ordering of object members?

Let's try 2a first:

$ cat test2.dat
{
  "foo": "value1",
  "bar": "value2"
}
$ curl -T test2.dat http://127.0.0.1:5984/test_suite_db/test2
{"ok":true,"id":"test2","rev":"4264834066"}
$ curl http://127.0.0.1:5984/test_suite_db/test2 >test2a
$ cat test2a
{"_id":"test2","_rev":"4264834066","foo":"value1","bar":"value2"}
$ curl -T test2a http://127.0.0.1:5984/test_suite_db/test2
{"ok":true,"id":"test2","rev":"396680012"}
$ curl http://127.0.0.1:5984/test_suite_db/test2 >test2b
$ cat test2b
{"_id":"test2","_rev":"396680012","foo":"value1","bar":"value2"}

So it seems the answer is: a new _rev is allocated for any PUT, even if the
uploaded data is identical JSON. Hence 2b is irrelevant. It is up to the
client only to determine whether the data has changed or not, before
invoking a PUT operation.

(3) It is documented (but not stressed) that a document is a JSON object, as
opposed to any JSON value, but I thought I'd check that too:

$ cat test3.dat
["wibble","bibble"]
$ curl -T test3.dat http://127.0.0.1:5984/test_suite_db/test3
{"error":"error","reason":"function_clause"}

(4) If a document has attachments, what happens if you upload a new version
without the _attachments member or not listing a particular attachment? Is
it treated as an error, or ignored, or are the missing attachment(s) removed?

$ cat test4.dat
{
  "_attachments":
  {
    "foo.txt":
    {
      "content_type":"text\/plain",
      "data":"VGhpcyBpcyBmb28="
    },
    "bar.txt":
    {
      "content_type":"text\/plain",
      "data":"YW5kIHRoaXMgaXMgYmFy"
    }
  }
}
$ curl -T test4.dat http://127.0.0.1:5984/test_suite_db/test4
{"ok":true,"id":"test4","rev":"2042916403"}
$ curl http://127.0.0.1:5984/test_suite_db/test4/foo.txt
This is foo
$ curl http://127.0.0.1:5984/test_suite_db/test4/bar.txt
and this is bar
$ curl http://127.0.0.1:5984/test_suite_db/test4 >test4a
$ cat test4a
{"_id":"test4","_rev":"2042916403","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11},"bar.txt":{"stub":true,"content_type":"text/plain","length":15}}}
$ cp test4a test4b
$ vi test4b
... remove bar.txt attachment ...
$ cat test4b
{"_id":"test4","_rev":"2042916403","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11}}}
$ curl -T test4b http://127.0.0.1:5984/test_suite_db/test4
{"ok":true,"id":"test4","rev":"652385907"}
$ curl http://127.0.0.1:5984/test_suite_db/test4
{"_id":"test4","_rev":"652385907","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11}}}
$ curl http://127.0.0.1:5984/test_suite_db/test4/foo.txt
This is foo
$ curl http://127.0.0.1:5984/test_suite_db/test4/bar.txt
{"error":"not_found","reason":"Document is missing attachment"}

So this suggests that omitting the _attachment entry when PUTing will
delete the attachment.

OK, what about if you submit with a new content_type and/or length?

$ cat test4d
{"_id":"test4","_rev":"652385907","_attachments":{"foo.txt":{"stub":true,"content_type":"text/html","length":4}}}
$ curl -T test4d http://127.0.0.1:5984/test_suite_db/test4
{"ok":true,"id":"test4","rev":"3267807076"} 
$ curl http://127.0.0.1:5984/test_suite_db/test4
{"_id":"test4","_rev":"3267807076","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11}}}
$ curl http://127.0.0.1:5984/test_suite_db/test4/foo.txt
This is foo

So it looks like the attributes of the attachment are ignored. (This begs
the question: is it possible to change the content_type of an attachment
without re-uploading it? But that's probably not very useful anyway)

(5) How do the document's _id attribute and the id given in the URL
interact? Specifically:
(5a) If I PUT a new document to /db/id1 but the document contains
"_id":"id2", which wins?

$ cat test5a.dat
{"_id":"abc123","foo":"bar"}
$ curl -T test5a.dat http://127.0.0.1:5984/test_suite_db/test5
{"ok":true,"id":"test5","rev":"1818553963"}

Answer: The _id attribute is ignored, and the URL wins

(5b) What about for an existing document?

$ cat test5b
{"_id":"test1","_rev":"1818553963","foo":"bar"}
$ curl -T test5b http://127.0.0.1:5984/test_suite_db/test5
{"ok":true,"id":"test5","rev":"4125462791"}

Answer: The _id attribute is ignored, and the URL wins. (But clearly the
_rev is taken from the document itself)

(5c) If I POST to /db but the document contains "_id":"id3", is a random
document id still assigned?

$ curl -d '{"_id":"xxxyyy", "foo":"bar"}' http://127.0.0.1:5984/test_suite_db
{"ok":true,"id":"6aca52f5b234ff61bc32318cb0ea2f84","rev":"1457889014"}

Answer: Yes, the document's _id attribute is ignored.

(5d) What about _rev?

For PUT:
$ curl http://127.0.0.1:5984/test_suite_db/test5 >test5c
$ curl -T test5c http://127.0.0.1:5984/test_suite_db/test5copy
{"error":"conflict","reason":"Document update conflict."}

Answer: The presence of _rev indicates whether the document already exists
or not, so whilst _id is ignored, _rev must be removed if you are going to
make a copy of an existing document.

For POST:
$ curl http://127.0.0.1:5984/test_suite_db/test5 |
  curl -X POST -T - http://127.0.0.1:5984/test_suite_db
{"ok":true,"id":"c043ec883ee0926cc344b38e9cf00db9","rev":"1677811569"}

Answer: for POST, both _id and _rev are ignored.

Aside: I see at http://wiki.apache.org/couchdb/HTTP_Document_API

  "A CouchDB document is simply a JSON object ... The document can be an
  arbitrary JSON object"

which technically answers (1b) and (3). However in the book I suggest it may
be worth discussing exactly what is a valid CouchDB document, and what is
not.

Regards,

Brian.

Re: Some CouchDB Qs & As

Posted by Chris Anderson <jc...@apache.org>.
On Sun, Jan 25, 2009 at 4:09 AM, Brian Candler <B....@pobox.com> wrote:
> As a newbie, I had some questions which I still had after reading the wiki
> and the book so far, so I thought I'd do some experiments to find the
> answers.
>
> I'm just posting them here in case it's useful to anyone else, or it could
> provide some ideas when writing the book.

Brian,

Awesome work digging into details here. You asked and answered a lot
of questions I've occasionally wondered about. Most of the behavior
you uncovered seems correct, but some of it makes me curious about
other questions.

>
> (1a) Does CouchDB store the raw JSON which it receives, character by
> character, or does it convert to and from an internal representation?

We can also check to see if the various JSON representations of
unicode characters are preserved:

$ cat unicodes.json
{"slashed":"\u0444", "raw":"ф"}

$ curl -T unicodes.json http://127.0.0.1:5984/test_suite_db/unicodes
{"ok":true,"id":"unicodes","rev":"1228573312"}

$ curl http://127.0.0.1:5984/test_suite_db/unicodes
{"_id":"unicodes","_rev":"1228573312","slashed":"\u0444","raw":"\u0444"}

Looks like we canonicalize them to their escaped encodings.

> This suggests that the JSON is converted into some internal representation,
> and then converted back to JSON.
>


> (3) It is documented (but not stressed) that a document is a JSON object, as
> opposed to any JSON value, but I thought I'd check that too:
>
> $ cat test3.dat
> ["wibble","bibble"]
> $ curl -T test3.dat http://127.0.0.1:5984/test_suite_db/test3
> {"error":"error","reason":"function_clause"}
>

That error message is horrendous. I just committed a change, now you'll get:

{"error":"invalid_json_object","reason":"Document must be a JSON object"}


> OK, what about if you submit with a new content_type and/or length?
>
> So it looks like the attributes of the attachment are ignored. (This begs
> the question: is it possible to change the content_type of an attachment
> without re-uploading it? But that's probably not very useful anyway)
>

There was also talk about adding the COPY and MOVE verbs for
attachment management. I'd have to dig into the code to see if
changing the content-type will require rewriting the attachment, or if
it only requires changing the doc meta.


>
> Answer: The presence of _rev indicates whether the document already exists
> or not, so whilst _id is ignored, _rev must be removed if you are going to
> make a copy of an existing document.
>

One could probably do a whole series of experiments to see how _rev is
treated under the COPY and MOVE verbs (which are implemented for
documents). Also, some of these experiments are likely codified as
part of the test suite.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com