You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mark Wakabayashi <ma...@moshymoshy.com> on 2013/01/23 00:14:36 UTC

Bulk insert-or-update

Hi,

I need a way to automatically ensure that certain documents exist with
pre-specified content as part of a script. That is, I have a file with a
set of documents, and I want to run a script that will either insert them
if they don't exist, or else ensure that the documents in the database
exactly match what I have in the file. Any documents not specified in the
file should remain untouched.

The best way I've found so far to do this has been to:
*have the documents in a file in the format described in
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
*delete each document. Something like:
for ID in `grep "_id" $FILE | sed 's/.*"_id":"\([^"]*\)".*/\1/'`; do
  REV=`curl -X GET --silent http://${COUCH_HOST}/foo/${ID} | sed
's/.*"_rev":"\([^"]*\)".*/\1/'`
  curl -X DELETE http://${COUCH_HOST}/foo/${ID}?rev=${REV}
done
*insert the documents as they are in the file using _bulk_docs

Is there a better way to do this?


This seems to work most of the time, but I'm having intermittent failures
where the _bulk_docs reports success but doesn't actually insert the
documents. If I run the script repeatedly, the bulk insert will
occasionally report that the documents have been inserted with revision 1,
and the documents are then reported as 'deleted'. That is, the bulk insert
returns something like:
[{"ok":true,"id":"bar","rev":"1-243f8f87ed4b0abe0ef00c725d346e07"},...]
and document "bar" remains deleted, where normally it would return much
higher revisions, like
[{"ok":true,"id":"bar","rev":"229-98ff2bb4ad8754cb254ef2e0392d6ab0"},...]
and "bar" would be available.

Running the script again fixes the problem. Is this a bug in bulk
insertion? Are there any relevant limitations of bulk insertion that I
should be aware of?

Thanks in advance,
Mark

Re: Bulk insert-or-update

Posted by Dan Everton <da...@iocaine.org>.
You appear to be hitting this bug:

https://issues.apache.org/jira/browse/COUCHDB-1415

I'm not aware of any workaround other than adding a salt to the
document or waiting for it to be fixed in CouchDB.

Cheers,
Dan

On Wed, Jan 23, 2013 at 9:14 AM, Mark Wakabayashi <ma...@moshymoshy.com> wrote:
> Hi,
>
> I need a way to automatically ensure that certain documents exist with
> pre-specified content as part of a script. That is, I have a file with a
> set of documents, and I want to run a script that will either insert them
> if they don't exist, or else ensure that the documents in the database
> exactly match what I have in the file. Any documents not specified in the
> file should remain untouched.
>
> The best way I've found so far to do this has been to:
> *have the documents in a file in the format described in
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> *delete each document. Something like:
> for ID in `grep "_id" $FILE | sed 's/.*"_id":"\([^"]*\)".*/\1/'`; do
>   REV=`curl -X GET --silent http://${COUCH_HOST}/foo/${ID} | sed
> 's/.*"_rev":"\([^"]*\)".*/\1/'`
>   curl -X DELETE http://${COUCH_HOST}/foo/${ID}?rev=${REV}
> done
> *insert the documents as they are in the file using _bulk_docs
>
> Is there a better way to do this?
>
>
> This seems to work most of the time, but I'm having intermittent failures
> where the _bulk_docs reports success but doesn't actually insert the
> documents. If I run the script repeatedly, the bulk insert will
> occasionally report that the documents have been inserted with revision 1,
> and the documents are then reported as 'deleted'. That is, the bulk insert
> returns something like:
> [{"ok":true,"id":"bar","rev":"1-243f8f87ed4b0abe0ef00c725d346e07"},...]
> and document "bar" remains deleted, where normally it would return much
> higher revisions, like
> [{"ok":true,"id":"bar","rev":"229-98ff2bb4ad8754cb254ef2e0392d6ab0"},...]
> and "bar" would be available.
>
> Running the script again fixes the problem. Is this a bug in bulk
> insertion? Are there any relevant limitations of bulk insertion that I
> should be aware of?
>
> Thanks in advance,
> Mark

Re: Bulk insert-or-update

Posted by Jim Klo <ji...@sri.com>.
Wouldn't an update handler work that's designed to do something like an UPSERT?

http://wiki.apache.org/couchdb/Document_Update_Handlers

Not exactly a bulk apiā€¦ but I think it would do what you are asking...


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t.	@nsomnac

On Jan 22, 2013, at 3:14 PM, Mark Wakabayashi <ma...@moshymoshy.com> wrote:

> Hi,
> 
> I need a way to automatically ensure that certain documents exist with
> pre-specified content as part of a script. That is, I have a file with a
> set of documents, and I want to run a script that will either insert them
> if they don't exist, or else ensure that the documents in the database
> exactly match what I have in the file. Any documents not specified in the
> file should remain untouched.
> 
> The best way I've found so far to do this has been to:
> *have the documents in a file in the format described in
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> *delete each document. Something like:
> for ID in `grep "_id" $FILE | sed 's/.*"_id":"\([^"]*\)".*/\1/'`; do
>  REV=`curl -X GET --silent http://${COUCH_HOST}/foo/${ID} | sed
> 's/.*"_rev":"\([^"]*\)".*/\1/'`
>  curl -X DELETE http://${COUCH_HOST}/foo/${ID}?rev=${REV}
> done
> *insert the documents as they are in the file using _bulk_docs
> 
> Is there a better way to do this?
> 
> 
> This seems to work most of the time, but I'm having intermittent failures
> where the _bulk_docs reports success but doesn't actually insert the
> documents. If I run the script repeatedly, the bulk insert will
> occasionally report that the documents have been inserted with revision 1,
> and the documents are then reported as 'deleted'. That is, the bulk insert
> returns something like:
> [{"ok":true,"id":"bar","rev":"1-243f8f87ed4b0abe0ef00c725d346e07"},...]
> and document "bar" remains deleted, where normally it would return much
> higher revisions, like
> [{"ok":true,"id":"bar","rev":"229-98ff2bb4ad8754cb254ef2e0392d6ab0"},...]
> and "bar" would be available.
> 
> Running the script again fixes the problem. Is this a bug in bulk
> insertion? Are there any relevant limitations of bulk insertion that I
> should be aware of?
> 
> Thanks in advance,
> Mark