You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jean-Pierre Fiset <jp...@fiset.ca> on 2013/10/03 20:35:47 UTC

Contribution: CouchDb dump and reload

I am working on the project Nunaliit (http://nunaliit.org). As part of the project, we have
developed tools that allow a user to dump an instance of CouchDb to disk, and the tools to
reload a database from disk.

The database documents are stored to disk in text files. The document content is formatted in
JSON. Attachments are also saved to disk in their native format.

This set of tools is written in Java and built using Maven. It features plenty of interfaces to
access CouchDb directly in Java. It also features a command line interface to perform dumps and
restores.

Currently, these tools are within a larger project. I am intending to seperate the dump and
restore components, with all dependencies, out of the larger project to make them more
accessible to the community.

Is there a location or a project where these sorts of tools belong to? If feasible, I'd like to
contribute the code where it will serve best.

JP

Re: Contribution: CouchDb dump and reload

Posted by Vivek Pathak <vp...@orgmeta.com>.
I use couchdb-python and zip the output.  The dump is useful because it 
backs up everything including design docs.

I was wondering if link to this tool should be placed here: 
http://wiki.apache.org/couchdb/CouchDB_tools

Thanks

On 10/3/13 5:51 PM, Alexander Shorin wrote:
> On Fri, Oct 4, 2013 at 1:28 AM, Jens Alfke <je...@couchbase.com> wrote:
>> On Oct 3, 2013, at 11:43 AM, Vivek Pathak <vp...@orgmeta.com> wrote:
>>
>>> Just fyi,  there is couchdb-dump available in
>>> tihttp://code.google.com/p/couchdb-python/
>> Looks like these two tools use entirely different data formats. Has anyone thought of defining a common format for database dumps?
> I think it will be hard to define such.
>
> Dumping CouchDB data as JSON looks intuitive and requires less
> additional actions for import/export. Having couchdb-python approach
> with multipart format provides lesser footprint, but requires a more
> tricky processing (boundaries, headers). Both solutions may use
> CouchDB API without any additional data conversion.  And both requires
> a lot of disk space, much more than if you just copy database file or
> make a replica of it, unless you xz-zip the output.
>
> --
> ,,,^..^,,,


Re: Contribution: CouchDb dump and reload

Posted by Jean-Pierre Fiset <jp...@fiset.ca>.
The tool we have developed uses a strategy similar to all_docs?include_docs=true to dump. The
format on disk breaks the document into the top level attributes into different files to make it
easy for humans to understand/edit. The format is similar to that used by the python tool
"couchapp". Furthermore, the attachments are saved in their own files with some supporting
information to keep track of information such as content type.

The restore process is more elaborate. Here are some features:

- When restoring, the restore tool ensures that a document modified since the last dump is not
replaced by the document on disk. This ensures that the documents located in the database
remains the authoritative ones. On a restore, an operator can force the restoration, but by
default it attempts to protect the database.

- It is possible to restore only specific documents. This currently done by specifying the
identifiers of the documents to be restored.

- When uploading a documents, all the changes are applied at once, increasing the document
revision only by one.

- Disk documents that are equivalent to the ones found in the database are detected and the
upload is skipped.

The only "proprietary" facet of this process are a digest installed on the database document to
keep track of the document's content, the name of the attribute to save the digest and the
method the digest is computed. Every time a document is uploaded using the restore tool, a
digest of the document is calculated and added to the document. This allows the restore tool to
find out whether a database document has been manually modified since the last time it was
uploaded. Since it would be difficult for someone to compute the digest, or craft a new version
document to collide with the current digest, it ensures that the restore tool does not
inadvertantly overwrite changes to the database perform by a human.

The method to compute the digest is straight forward and could be standardized to allow various
tools to interact with a single database.

As far as standardizing how documents should be stored to disk, it would probably be a
worthwhile endeavour.

JP

On 2013-10-03 20:28, Filippo Fadda wrote:
> Is it basically an all_docs + include_docs?
> 
> -Filippo
> 
> On Oct 3, 2013, at 11:51 PM, Alexander Shorin wrote:
> 
>> On Fri, Oct 4, 2013 at 1:28 AM, Jens Alfke <je...@couchbase.com> wrote:
>>> On Oct 3, 2013, at 11:43 AM, Vivek Pathak <vp...@orgmeta.com> wrote:
>>>
>>>> Just fyi,  there is couchdb-dump available in
>>>> tihttp://code.google.com/p/couchdb-python/
>>>
>>> Looks like these two tools use entirely different data formats. Has anyone thought of defining a common format for database dumps?
>>
>> I think it will be hard to define such.
>>
>> Dumping CouchDB data as JSON looks intuitive and requires less
>> additional actions for import/export. Having couchdb-python approach
>> with multipart format provides lesser footprint, but requires a more
>> tricky processing (boundaries, headers). Both solutions may use
>> CouchDB API without any additional data conversion.  And both requires
>> a lot of disk space, much more than if you just copy database file or
>> make a replica of it, unless you xz-zip the output.
>>
>> --
>> ,,,^..^,,,
> 


Re: Contribution: CouchDb dump and reload

Posted by Filippo Fadda <fi...@programmazione.it>.
Is it basically an all_docs + include_docs?

-Filippo

On Oct 3, 2013, at 11:51 PM, Alexander Shorin wrote:

> On Fri, Oct 4, 2013 at 1:28 AM, Jens Alfke <je...@couchbase.com> wrote:
>> On Oct 3, 2013, at 11:43 AM, Vivek Pathak <vp...@orgmeta.com> wrote:
>> 
>>> Just fyi,  there is couchdb-dump available in
>>> tihttp://code.google.com/p/couchdb-python/
>> 
>> Looks like these two tools use entirely different data formats. Has anyone thought of defining a common format for database dumps?
> 
> I think it will be hard to define such.
> 
> Dumping CouchDB data as JSON looks intuitive and requires less
> additional actions for import/export. Having couchdb-python approach
> with multipart format provides lesser footprint, but requires a more
> tricky processing (boundaries, headers). Both solutions may use
> CouchDB API without any additional data conversion.  And both requires
> a lot of disk space, much more than if you just copy database file or
> make a replica of it, unless you xz-zip the output.
> 
> --
> ,,,^..^,,,


Re: Contribution: CouchDb dump and reload

Posted by Alexander Shorin <kx...@gmail.com>.
On Fri, Oct 4, 2013 at 1:28 AM, Jens Alfke <je...@couchbase.com> wrote:
> On Oct 3, 2013, at 11:43 AM, Vivek Pathak <vp...@orgmeta.com> wrote:
>
>> Just fyi,  there is couchdb-dump available in
>> tihttp://code.google.com/p/couchdb-python/
>
> Looks like these two tools use entirely different data formats. Has anyone thought of defining a common format for database dumps?

I think it will be hard to define such.

Dumping CouchDB data as JSON looks intuitive and requires less
additional actions for import/export. Having couchdb-python approach
with multipart format provides lesser footprint, but requires a more
tricky processing (boundaries, headers). Both solutions may use
CouchDB API without any additional data conversion.  And both requires
a lot of disk space, much more than if you just copy database file or
make a replica of it, unless you xz-zip the output.

--
,,,^..^,,,

Re: Contribution: CouchDb dump and reload

Posted by Jens Alfke <je...@couchbase.com>.
On Oct 3, 2013, at 11:43 AM, Vivek Pathak <vp...@orgmeta.com> wrote:

> Just fyi,  there is couchdb-dump available in 
> http://code.google.com/p/couchdb-python/

Looks like these two tools use entirely different data formats. Has anyone thought of defining a common format for database dumps?

(I have an interest in this because some iOS/Android developers want to be able to preload a canned Couchbase Lite database into their app when it’s first launched.  Currently they’d do that by taking the SQLite database file from a database, packaging that with the app, and copying it into place. But it’d be cleaner if that canned database were in a format that’s easier to inspect and less likely to have versioning problems.)

—Jens

Re: Contribution: CouchDb dump and reload

Posted by Albin Stigö <al...@gmail.com>.
tror du du kan hitta "jag pysslar lite med mina kemikaler" kalle anka?

On Thu, Oct 3, 2013 at 8:48 PM, Albin Stigö <al...@gmail.com> wrote:
> Tack tack!!!
>
> On Thu, Oct 3, 2013 at 8:43 PM, Vivek Pathak <vp...@orgmeta.com> wrote:
>> Just fyi,  there is couchdb-dump available in
>> http://code.google.com/p/couchdb-python/
>>
>> On 10/03/2013 02:35 PM, Jean-Pierre Fiset wrote:
>>>
>>> I am working on the project Nunaliit (http://nunaliit.org). As part of the
>>> project, we have
>>> developed tools that allow a user to dump an instance of CouchDb to disk,
>>> and the tools to
>>> reload a database from disk.
>>>
>>> The database documents are stored to disk in text files. The document
>>> content is formatted in
>>> JSON. Attachments are also saved to disk in their native format.
>>>
>>> This set of tools is written in Java and built using Maven. It features
>>> plenty of interfaces to
>>> access CouchDb directly in Java. It also features a command line interface
>>> to perform dumps and
>>> restores.
>>>
>>> Currently, these tools are within a larger project. I am intending to
>>> seperate the dump and
>>> restore components, with all dependencies, out of the larger project to
>>> make them more
>>> accessible to the community.
>>>
>>> Is there a location or a project where these sorts of tools belong to? If
>>> feasible, I'd like to
>>> contribute the code where it will serve best.
>>>
>>> JP
>>
>>

Re: Contribution: CouchDb dump and reload

Posted by Albin Stigö <al...@gmail.com>.
Tack tack!!!

On Thu, Oct 3, 2013 at 8:43 PM, Vivek Pathak <vp...@orgmeta.com> wrote:
> Just fyi,  there is couchdb-dump available in
> http://code.google.com/p/couchdb-python/
>
> On 10/03/2013 02:35 PM, Jean-Pierre Fiset wrote:
>>
>> I am working on the project Nunaliit (http://nunaliit.org). As part of the
>> project, we have
>> developed tools that allow a user to dump an instance of CouchDb to disk,
>> and the tools to
>> reload a database from disk.
>>
>> The database documents are stored to disk in text files. The document
>> content is formatted in
>> JSON. Attachments are also saved to disk in their native format.
>>
>> This set of tools is written in Java and built using Maven. It features
>> plenty of interfaces to
>> access CouchDb directly in Java. It also features a command line interface
>> to perform dumps and
>> restores.
>>
>> Currently, these tools are within a larger project. I am intending to
>> seperate the dump and
>> restore components, with all dependencies, out of the larger project to
>> make them more
>> accessible to the community.
>>
>> Is there a location or a project where these sorts of tools belong to? If
>> feasible, I'd like to
>> contribute the code where it will serve best.
>>
>> JP
>
>

Re: Contribution: CouchDb dump and reload

Posted by Vivek Pathak <vp...@orgmeta.com>.
Just fyi,  there is couchdb-dump available in 
http://code.google.com/p/couchdb-python/

On 10/03/2013 02:35 PM, Jean-Pierre Fiset wrote:
> I am working on the project Nunaliit (http://nunaliit.org). As part of the project, we have
> developed tools that allow a user to dump an instance of CouchDb to disk, and the tools to
> reload a database from disk.
>
> The database documents are stored to disk in text files. The document content is formatted in
> JSON. Attachments are also saved to disk in their native format.
>
> This set of tools is written in Java and built using Maven. It features plenty of interfaces to
> access CouchDb directly in Java. It also features a command line interface to perform dumps and
> restores.
>
> Currently, these tools are within a larger project. I am intending to seperate the dump and
> restore components, with all dependencies, out of the larger project to make them more
> accessible to the community.
>
> Is there a location or a project where these sorts of tools belong to? If feasible, I'd like to
> contribute the code where it will serve best.
>
> JP