You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Brad Schick <sc...@gmail.com> on 2008/06/12 20:27:39 UTC

Modifying fields

I've just started evaluating CouchDB and so far I'm very impressed. I've
been comparing it to Amazon's SimpleDB in particular, and couchdb looks
like a great alternative.

One thing that I haven't found in the couchdb API that surprises me,
however, it a way to directly add or modify individual fields within a
document. For example, how would I efficiently update just one field in
a few thousand large-ish documents?

Looking at the current API, it seems like I would have retrieve each
document in its entirety, update that one field, then write back the
entire document. I see there is a way to bulk write many documents at
once, but that's only a small improvement. Have I missed something? If
not, is this ability planned for the future?

My first thought was that this could be implemented this with
server-side script functions related to views and/or with HTTP methods
on field URIs.

Thanks,
-Brad

Re: Modifying fields

Posted by Jan Lehnardt <ja...@apache.org>.
On Jun 12, 2008, at 20:27, Brad Schick wrote:

> I've just started evaluating CouchDB and so far I'm very impressed.  
> I've
> been comparing it to Amazon's SimpleDB in particular, and couchdb  
> looks
> like a great alternative.
>
> One thing that I haven't found in the couchdb API that surprises me,
> however, it a way to directly add or modify individual fields within a
> document. For example, how would I efficiently update just one field  
> in
> a few thousand large-ish documents?
>
> Looking at the current API, it seems like I would have retrieve each
> document in its entirety, update that one field, then write back the
> entire document. I see there is a way to bulk write many documents at
> once, but that's only a small improvement. Have I missed something? If
> not, is this ability planned for the future?

Yes, that is correct. There have been discussions about allowing single
field updates or "delta updates", but none of them came up with  
satisfying
solutions and I personally think that this feature is not worth the  
hassle.
What I found in general that this request comes from too RDBMS-centric
thinking and that it might be solvable in more CouchDB-ish ways.

What exactly are you trying to do?

In any case, such an API would only be 'syntactic sugar' and would
create the same kind of load on the server. It only saves a bit of
bandwidth and client work, but CouchDB deliberately puts more
work into the client. And as long as this is not the final bottleneck in
a perfectly designed app, there's little need to optimise this.

Cheers
Jan

> My first thought was that this could be implemented this with
> server-side script functions related to views and/or with HTTP methods
> on field URIs.
>
> Thanks,
> -Brad
>


Re: Modifying fields

Posted by Jan Lehnardt <ja...@apache.org>.
On Jun 14, 2008, at 14:51, Benoit Chesneau wrote:

> On Fri, Jun 13, 2008 at 8:24 PM, Jan Lehnardt <ja...@prima.de> wrote:
>> If we were to implement this, CouchDB would still do a full  
>> document update
>> to comply with the revision system.
>>
> yes but it will use less bandwidth.

As pointed out earlier in this thread ;-)

Cheers
Jan
--

Re: Modifying fields

Posted by Benoit Chesneau <bc...@gmail.com>.
On Fri, Jun 13, 2008 at 8:24 PM, Jan Lehnardt <ja...@prima.de> wrote:
> If we were to implement this, CouchDB would still do a full document update
> to comply with the revision system.
>
yes but it will use less bandwidth.


- benoƮt

Re: Modifying fields

Posted by Jan Lehnardt <ja...@prima.de>.
If we were to implement this, CouchDB would still do a full document  
update to comply with the revision system.

Cheers
Jan
--


On 13 Jun 2008, at 19:57, Brad Schick <sc...@gmail.com> wrote:

>
> On 06/13/2008 01:53 AM, Jan Lehnardt wrote:
>>>>> Follow up questions on this: Does CouchDB internally track and
>>>>> reference
>>>>> individual fields? Or is the json for each document basically a
>>>>> blob to
>>>>> everything except View code?
>>>>
>>>> Documents are stored into native Erlang types representing each
>>>> document. Except for the view server, no-one cares about what
>>>> a document look like.
>>> So the DB just treats each document like a string I assume? I was  
>>> hoping
>>> it actually understood the fields. If it doesn't know about  
>>> fields, then
>>> I understand that it might not be that much more efficient doing  
>>> things
>>> on the server.
>>>
>>> But I'm curious; if the Erlang code doesn't look inside documents  
>>> why do
>>> I get errors if I pass just a json array as the body of a  
>>> document? It
>>> seems to require a json object with named pairs.
>>
>> No no, CouchDB definitely looks at the JSON structure.
>>
>
> Thanks for the continued feedback. So is CouchDB internally able to
> modify individual fields within a document? Or even if it can not do  
> so
> yet, would it be practical to add later?
>
> It makes sense for this to not be a priority now, I'm just wondering  
> if
> this somehow fundamental contradicts CouchDB's design.
>
> -Brad
>

Re: Modifying fields

Posted by Brad Schick <sc...@gmail.com>.
On 06/13/2008 01:53 AM, Jan Lehnardt wrote:
>>>> Follow up questions on this: Does CouchDB internally track and
>>>> reference
>>>> individual fields? Or is the json for each document basically a
>>>> blob to
>>>> everything except View code?
>>>
>>> Documents are stored into native Erlang types representing each
>>> document. Except for the view server, no-one cares about what
>>> a document look like.
>> So the DB just treats each document like a string I assume? I was hoping
>> it actually understood the fields. If it doesn't know about fields, then
>> I understand that it might not be that much more efficient doing things
>> on the server.
>>
>> But I'm curious; if the Erlang code doesn't look inside documents why do
>> I get errors if I pass just a json array as the body of a document? It
>> seems to require a json object with named pairs.
>
> No no, CouchDB definitely looks at the JSON structure.
>

Thanks for the continued feedback. So is CouchDB internally able to
modify individual fields within a document? Or even if it can not do so
yet, would it be practical to add later?

It makes sense for this to not be a priority now, I'm just wondering if
this somehow fundamental contradicts CouchDB's design.

-Brad

Re: Modifying fields

Posted by Jan Lehnardt <ja...@apache.org>.
On Jun 13, 2008, at 11:16, Chris Anderson wrote:

> On Fri, Jun 13, 2008 at 1:53 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>
>> On Jun 13, 2008, at 01:40, Brad Schick wrote:
>>
>>> It would be interesting to know the load on the DB of doing  
>>> something
>>> like that inside the server versus sending and receiving all million
>>> documents to the client.
>>
>> You'd save all HTTP handling. So things would be faster. We are at a
>> point with CouchDB where we are working on getting it right and not
>> getting it fast or adding features for all edge cases. I'm not  
>> saying that
>> CouchDB will never get a feature that helps you, but it is not yet a
>> priority.
>>
>
> That might be something doable via the view server / search query /
> line-based json API. A job runner that you can submit jobs to wouldn't
> be a bad way to obviate http connection issues. Send the job to the
> data, they say. And then you could write job functions in any language
> you want.
>
> Building it as a plugin would solve the immediate issue, and allow
> CouchDB's core to concentrate on getting it right. Plus job runners
> just sound useful anyway, especially for things like functions that
> clone a group-reduce result to a new database for further manipulation
> and querying.

That sounds much more like it :)

Cheers
Jan
--


Re: Modifying fields

Posted by Chris Anderson <jc...@grabb.it>.
On Fri, Jun 13, 2008 at 1:53 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
> On Jun 13, 2008, at 01:40, Brad Schick wrote:
>
>> It would be interesting to know the load on the DB of doing something
>> like that inside the server versus sending and receiving all million
>> documents to the client.
>
> You'd save all HTTP handling. So things would be faster. We are at a
> point with CouchDB where we are working on getting it right and not
> getting it fast or adding features for all edge cases. I'm not saying that
> CouchDB will never get a feature that helps you, but it is not yet a
> priority.
>

That might be something doable via the view server / search query /
line-based json API. A job runner that you can submit jobs to wouldn't
be a bad way to obviate http connection issues. Send the job to the
data, they say. And then you could write job functions in any language
you want.

Building it as a plugin would solve the immediate issue, and allow
CouchDB's core to concentrate on getting it right. Plus job runners
just sound useful anyway, especially for things like functions that
clone a group-reduce result to a new database for further manipulation
and querying.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Modifying fields

Posted by Jan Lehnardt <ja...@apache.org>.
On Jun 13, 2008, at 01:40, Brad Schick wrote:

> Thanks for the feedback.
>
>
> On 06/12/2008 03:21 PM, Jan Lehnardt wrote:
>>> Follow up questions on this: Does CouchDB internally track and  
>>> reference
>>> individual fields? Or is the json for each document basically a  
>>> blob to
>>> everything except View code?
>>
>> Documents are stored into native Erlang types representing each
>> document. Except for the view server, no-one cares about what
>> a document look like.
> So the DB just treats each document like a string I assume? I was  
> hoping
> it actually understood the fields. If it doesn't know about fields,  
> then
> I understand that it might not be that much more efficient doing  
> things
> on the server.
>
> But I'm curious; if the Erlang code doesn't look inside documents  
> why do
> I get errors if I pass just a json array as the body of a document? It
> seems to require a json object with named pairs.

No no, CouchDB definitely looks at the JSON structure.

> My data-model is still a work in progress, so perhaps I won't really
> need to update lots documents in sequence. Mostly I've been thinking  
> of
> maintenance examples. Like, every value for some textfield needs to be
> escaped and written back to the DB after there are already a million
> documents containing that textfield in the DB.

Yeah, that's a sort of valid use-case and CouchDB is not a very good
fit for that. But if it is maintenance, it might be okay to be slower :)


> It would be interesting to know the load on the DB of doing something
> like that inside the server versus sending and receiving all million
> documents to the client.

You'd save all HTTP handling. So things would be faster. We are at a
point with CouchDB where we are working on getting it right and not
getting it fast or adding features for all edge cases. I'm not saying  
that
CouchDB will never get a feature that helps you, but it is not yet a
priority.

Cheers
Jan
--

Re: Modifying fields

Posted by Brad Schick <sc...@gmail.com>.
Thanks for the feedback.


On 06/12/2008 03:21 PM, Jan Lehnardt wrote:
>> Follow up questions on this: Does CouchDB internally track and reference
>> individual fields? Or is the json for each document basically a blob to
>> everything except View code?
>
> Documents are stored into native Erlang types representing each
> document. Except for the view server, no-one cares about what
> a document look like.
So the DB just treats each document like a string I assume? I was hoping
it actually understood the fields. If it doesn't know about fields, then
I understand that it might not be that much more efficient doing things
on the server.

But I'm curious; if the Erlang code doesn't look inside documents why do
I get errors if I pass just a json array as the body of a document? It
seems to require a json object with named pairs.

>> (caveat: I know little about CouchDB internals, so the following is
>> based on how I assume it might work)
>>
>> To complement Views, how about a concept of Modifier scripts? These
>> would work in two separate stages. First, a map stage would build an
>> index similar to Views. If CouchDB is able to reference individual
>> fields, the map would emit a key and field names for each document. If
>> CouchDB is only able to reference documents, the map would emit just a
>> key for each document. Then there would be a 'modify' stage that was run
>> when the modifier's URI was POSTed to. The modify function would accept
>> arbitrary JSON from the PUT, key(s), and either individual fields (if
>> possible) or  entire document(s) (if not). I'd assume the modify
>> function would have to be called either once per key or with blocks of
>> keys to avoid holding everything in memory at once.
>
> Why not do just post-modify on the client with caching? I don't really
> see
> the need to add that to the DB server. Note also, that the map output
> can optionally be reduced (and rereduced) which allow further
> computations.
I can't cache all of the document on all of the front-end servers (aka
clients). At any given time most of them won't be in a cache. In my idea
above, I don't think reduce would apply since the goal is just finding
the docs to update rather than computing a result.

If the documents are opaque to Erlang I understand the savings aren't
that great. But along with bandwidth it would also save the CPU overhead
of sending and receiving data (I believe I read somewhere that HTTP
processing in the DB is a significant component of total CPU use).

My data-model is still a work in progress, so perhaps I won't really
need to update lots documents in sequence. Mostly I've been thinking of
maintenance examples. Like, every value for some textfield needs to be
escaped and written back to the DB after there are already a million
documents containing that textfield in the DB.

It would be interesting to know the load on the DB of doing something
like that inside the server versus sending and receiving all million
documents to the client.


-Brad

Re: Modifying fields

Posted by Jan Lehnardt <ja...@apache.org>.
On Jun 13, 2008, at 00:07, Brad Schick wrote:

> On 06/12/2008 11:27 AM, Brad Schick wrote:
>> One thing that I haven't found in the couchdb API that surprises me,
>> however, it a way to directly add or modify individual fields  
>> within a
>> document. For example, how would I efficiently update just one  
>> field in
>> a few thousand large-ish documents?
>>
>> ...
>>
>> My first thought was that this could be implemented this with
>> server-side script functions related to views and/or with HTTP  
>> methods
>> on field URIs.
>>
>>
> Follow up questions on this: Does CouchDB internally track and  
> reference
> individual fields? Or is the json for each document basically a blob  
> to
> everything except View code?

Documents are stored into native Erlang types representing each
document. Except for the view server, no-one cares about what
a document look like.

> (caveat: I know little about CouchDB internals, so the following is
> based on how I assume it might work)
>
> To complement Views, how about a concept of Modifier scripts? These
> would work in two separate stages. First, a map stage would build an
> index similar to Views. If CouchDB is able to reference individual
> fields, the map would emit a key and field names for each document. If
> CouchDB is only able to reference documents, the map would emit just a
> key for each document. Then there would be a 'modify' stage that was  
> run
> when the modifier's URI was POSTed to. The modify function would  
> accept
> arbitrary JSON from the PUT, key(s), and either individual fields (if
> possible) or  entire document(s) (if not). I'd assume the modify
> function would have to be called either once per key or with blocks of
> keys to avoid holding everything in memory at once.

Why not do just post-modify on the client with caching? I don't really  
see
the need to add that to the DB server. Note also, that the map output
can optionally be reduced (and rereduced) which allow further
computations.


> Would that be practical? Should I start learning Erlang ;)

You should start learning Erlang in any case :)

Cheers
Jan
--

Re: Modifying fields

Posted by Brad Schick <sc...@gmail.com>.
On 06/12/2008 11:27 AM, Brad Schick wrote:
> One thing that I haven't found in the couchdb API that surprises me,
> however, it a way to directly add or modify individual fields within a
> document. For example, how would I efficiently update just one field in
> a few thousand large-ish documents?
>
> ...
>
> My first thought was that this could be implemented this with
> server-side script functions related to views and/or with HTTP methods
> on field URIs.
>
>   
Follow up questions on this: Does CouchDB internally track and reference
individual fields? Or is the json for each document basically a blob to
everything except View code?

(caveat: I know little about CouchDB internals, so the following is
based on how I assume it might work)

To complement Views, how about a concept of Modifier scripts? These
would work in two separate stages. First, a map stage would build an
index similar to Views. If CouchDB is able to reference individual
fields, the map would emit a key and field names for each document. If
CouchDB is only able to reference documents, the map would emit just a
key for each document. Then there would be a 'modify' stage that was run
when the modifier's URI was POSTed to. The modify function would accept
arbitrary JSON from the PUT, key(s), and either individual fields (if
possible) or  entire document(s) (if not). I'd assume the modify
function would have to be called either once per key or with blocks of
keys to avoid holding everything in memory at once.

Would that be practical? Should I start learning Erlang ;)

-Brad