You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Маркевич Александр <ra...@gmail.com> on 2011/11/08 15:42:57 UTC

CouchDB - Open Document Format mega storage

Hi CouchDB Team!

Can I store *.odt/*.ods files into Couch base? Because it have XML-based
format and stability specification (maybe in the future ;).

If I can't store ODT on this evolution stage of CouchDB - I want to write
or adapt any encoder XML->JSON->CouchDB?

What are think about it?

P.S. Sorry of my English.
-- 
Best regards Alexander Markevych.

Re: CouchDB - Open Document Format mega storage

Posted by Robert Newson <rn...@apache.org>.
If you can convert it to useful JSON, then I don't see a problem
(though I've no idea if this is possible).

Previous revisions of documents are removed during database
compaction, you shouldn't think of it as a versioning system for your
data.

B.

2011/11/8 Маркевич Александр <ra...@gmail.com>:
> Because after conversions we can have many document fields and properties,
> wich can save in CouchDB with history of revisions. And if I want to save
> doc from DB - I can accessed to any previous revision of the doc.
>
> 2011/11/8 Alexander Shorin <kx...@gmail.com>
>
>> 2011/11/8 Маркевич Александр <ra...@gmail.com>:
>> > Hi CouchDB Team!
>> >
>> > Can I store *.odt/*.ods files into Couch base? Because it have XML-based
>> > format and stability specification (maybe in the future ;).
>> >
>> > If I can't store ODT on this evolution stage of CouchDB - I want to write
>> > or adapt any encoder XML->JSON->CouchDB?
>> >
>> > What are think about it?
>> >
>> > P.S. Sorry of my English.
>> > --
>> > Best regards Alexander Markevych.
>> >
>>
>> Hi Alexander,
>>
>> Why not just store them (*.odt files) as document attachments without
>> any format conversions?
>>
>> --
>> ,,,^..^,,,
>>
>
>
>
> --
> С уважением Александр.
>

Re: CouchDB - Open Document Format mega storage

Posted by Robert Newson <rn...@apache.org>.
couchdb-lucene can index ODF attachments because it embeds Tika.

http://tika.apache.org/1.0/formats.html#OpenDocument_Format

B.

2011/11/9 Маркевич Александр <ra...@gmail.com>:
> So there are some variants of solutions:
>
> 1. Store doc in attachments as is. Can I have full text search on the
> content of docs?
> 2. Unzip and extract all XML files wich odt consists, convert to JSON and
> save all items of docs in DB, but also setup update handler to triggered
> updates and save it to any CVS.
> 3. Use Sedna as XML DB ( I want to use DB build on Erlang, although this
> variant is good to)
>
> Hmm... =)
> Have to try all solutions.
>

Re: CouchDB - Open Document Format mega storage

Posted by Cory Zue <cz...@dimagi.com>.
2011/11/9 Cory Zue <cz...@dimagi.com>

> 2011/11/9 Маркевич Александр <ra...@gmail.com>
>
>> So there are some variants of solutions:
>>
>> 1. Store doc in attachments as is. Can I have full text search on the
>> content of docs?
>> 2. Unzip and extract all XML files wich odt consists, convert to JSON and
>> save all items of docs in DB, but also setup update handler to triggered
>> updates and save it to any CVS.
>>
>
> If you go this route, you may be interested in couchforms[1], a library my
> company developed to store XForms in couchdb. We put a fair amount of
> processing/error handling code around it, but at it's core it's just an
> update handler that parses XML to JSON and sticks it in couch[2]. We've
> been relying on it in multiple production systems for over a year now and
> it's proven quite reliable.
>

I should have also linked to the actual parser, which is here:
https://github.com/dimagi/couchforms/blob/master/couchforms/_design/util/jsone4xml.js


>
> Cory
>
> [1] https://github.com/dimagi/couchforms
> [2]
> https://github.com/dimagi/couchforms/blob/master/couchforms/_design/updates/xform.js
>
>
> 3. Use Sedna as XML DB ( I want to use DB build on Erlang, although this
>> variant is good to)
>>
>> Hmm... =)
>> Have to try all solutions.
>>
>
>

Re: CouchDB - Open Document Format mega storage

Posted by Cory Zue <cz...@dimagi.com>.
2011/11/9 Маркевич Александр <ra...@gmail.com>

> So there are some variants of solutions:
>
> 1. Store doc in attachments as is. Can I have full text search on the
> content of docs?
> 2. Unzip and extract all XML files wich odt consists, convert to JSON and
> save all items of docs in DB, but also setup update handler to triggered
> updates and save it to any CVS.
>

If you go this route, you may be interested in couchforms[1], a library my
company developed to store XForms in couchdb. We put a fair amount of
processing/error handling code around it, but at it's core it's just an
update handler that parses XML to JSON and sticks it in couch[2]. We've
been relying on it in multiple production systems for over a year now and
it's proven quite reliable.

Cory

[1] https://github.com/dimagi/couchforms
[2]
https://github.com/dimagi/couchforms/blob/master/couchforms/_design/updates/xform.js


3. Use Sedna as XML DB ( I want to use DB build on Erlang, although this
> variant is good to)
>
> Hmm... =)
> Have to try all solutions.
>

Re: CouchDB - Open Document Format mega storage

Posted by Маркевич Александр <ra...@gmail.com>.
So there are some variants of solutions:

1. Store doc in attachments as is. Can I have full text search on the
content of docs?
2. Unzip and extract all XML files wich odt consists, convert to JSON and
save all items of docs in DB, but also setup update handler to triggered
updates and save it to any CVS.
3. Use Sedna as XML DB ( I want to use DB build on Erlang, although this
variant is good to)

Hmm... =)
Have to try all solutions.

Re: CouchDB - Open Document Format mega storage

Posted by Robert Newson <rn...@apache.org>.
Assuming you mean db_update_notification (and not _update handlers,
which are only called by clients) then, yes, they are triggered
whenever a db is updated. This sequence of events leads you to miss an
update, though;

1) Db updated
2) update notification fires
3) compaction occurs
4) processing update notification fetches doc

At step 3, all old revisions were met. Since the triggering of the
update notification and the subsequent exporting of the item are not
instantaneous, a window of data loss opens.

Storing old versions as attachments is a solid solution.

B.

On 9 November 2011 12:30, Alexander Shorin <kx...@gmail.com> wrote:
> On Wed, Nov 9, 2011 at 3:01 PM, Robert Newson <rn...@apache.org> wrote:
>> The suggestion to use an update notification handler to trigger
>> copying into a VCS is dangerous, as it is not guaranteed to get all
>> your changes (consider compaction running while you're doing this, you
>> could easily miss a revision).
>
> So update handlers aren't be triggered on each document updating? Or
> compaction prevents any update notifications?
> Could you explain in details why it could be dangerous?
>
> Idea with storing previous revisions within same document looks good
> until changes are few and document is smaller. However, maybe storing
> only json-patches could dramatically reduce total document size.
>
> --
> ,,,^..^,,,
>

Re: CouchDB - Open Document Format mega storage

Posted by Alexander Shorin <kx...@gmail.com>.
On Wed, Nov 9, 2011 at 3:01 PM, Robert Newson <rn...@apache.org> wrote:
> The suggestion to use an update notification handler to trigger
> copying into a VCS is dangerous, as it is not guaranteed to get all
> your changes (consider compaction running while you're doing this, you
> could easily miss a revision).

So update handlers aren't be triggered on each document updating? Or
compaction prevents any update notifications?
Could you explain in details why it could be dangerous?

Idea with storing previous revisions within same document looks good
until changes are few and document is smaller. However, maybe storing
only json-patches could dramatically reduce total document size.

--
,,,^..^,,,

Re: CouchDB - Open Document Format mega storage

Posted by Robert Newson <rn...@apache.org>.
The suggestion to use an update notification handler to trigger
copying into a VCS is dangerous, as it is not guaranteed to get all
your changes (consider compaction running while you're doing this, you
could easily miss a revision).

It's better to store all your versions in couchdb itself *in the
current revision*. Either as JSON;

{
  "_id":"mydoc",
  "_rev":"3-foo",
  "old_versions": {
    "1" : {}
    "2":" {}
  }
}

or by storing old revisions as attachments. The code doing the update
would post a new revision which contains all your versions, and thus
you cannot lose it, no matter when compaction runs.

B.

On 9 November 2011 10:09, Martin Hewitt <ma...@thenoi.se> wrote:
> You might want to look at couchdb-lucene, it indexes attachments using Apache Tika, so if Tika can index .odt files, you'll be able to search them through couchdb.
>
> Martin
>
> Sent from my iPhone
>
> On 9 Nov 2011, at 09:41, Маркевич Александр <ra...@gmail.com> wrote:
>
>>>
>>> You may setup update notification handler in CouchDB config and let it
>>> store any document updates within VCS (I've done same task via
>>> mercurial). So you'll reach two goals: use full power of CouchDB to
>>> search data within documents or transform it into various forms and
>>> track all changes in same time using special tool that done it right.
>>>
>>
>> This is a better solution than I thought in first!
>> Because this scheme will planned as a part of corporate reporting system,
>> we need to store some hundred/thousand of report in *.odt files. Need to
>> save all changes and have additional functional wich can get CouchDB
>> include fast text search, save another properties about report and many
>> others.
>>
>> --
>> Markevych Alexander.
>> С уважением Александр.
>

Re: CouchDB - Open Document Format mega storage

Posted by Martin Hewitt <ma...@thenoi.se>.
You might want to look at couchdb-lucene, it indexes attachments using Apache Tika, so if Tika can index .odt files, you'll be able to search them through couchdb. 

Martin

Sent from my iPhone

On 9 Nov 2011, at 09:41, Маркевич Александр <ra...@gmail.com> wrote:

>> 
>> You may setup update notification handler in CouchDB config and let it
>> store any document updates within VCS (I've done same task via
>> mercurial). So you'll reach two goals: use full power of CouchDB to
>> search data within documents or transform it into various forms and
>> track all changes in same time using special tool that done it right.
>> 
> 
> This is a better solution than I thought in first!
> Because this scheme will planned as a part of corporate reporting system,
> we need to store some hundred/thousand of report in *.odt files. Need to
> save all changes and have additional functional wich can get CouchDB
> include fast text search, save another properties about report and many
> others.
> 
> -- 
> Markevych Alexander.
> С уважением Александр.

Re: CouchDB - Open Document Format mega storage

Posted by Маркевич Александр <ra...@gmail.com>.
>
> You may setup update notification handler in CouchDB config and let it
> store any document updates within VCS (I've done same task via
> mercurial). So you'll reach two goals: use full power of CouchDB to
> search data within documents or transform it into various forms and
> track all changes in same time using special tool that done it right.
>

This is a better solution than I thought in first!
Because this scheme will planned as a part of corporate reporting system,
we need to store some hundred/thousand of report in *.odt files. Need to
save all changes and have additional functional wich can get CouchDB
include fast text search, save another properties about report and many
others.

-- 
Markevych Alexander.
С уважением Александр.

Re: CouchDB - Open Document Format mega storage

Posted by Alexander Shorin <kx...@gmail.com>.
2011/11/9 Маркевич Александр <ra...@gmail.com>:
> Thanks for all, I'll find another way. Maybe the original version control
> systems is better for this task (git, mercurial, SVN).
>
> Best regards team!
> --
> Markevych Alexander.
>

You may setup update notification handler in CouchDB config and let it
store any document updates within VCS (I've done same task via
mercurial). So you'll reach two goals: use full power of CouchDB to
search data within documents or transform it into various forms and
track all changes in same time using special tool that done it right.

--
,,,^..^,,,

Re: CouchDB - Open Document Format mega storage

Posted by Маркевич Александр <ra...@gmail.com>.
Thanks for all, I'll find another way. Maybe the original version control
systems is better for this task (git, mercurial, SVN).

Best regards team!
--
Markevych Alexander.

Re: CouchDB - Open Document Format mega storage

Posted by sftf <sf...@mail.ru>.
МА> Because after conversions we can have many document fields and properties,
МА> wich can save in CouchDB with history of revisions. And if I want to save
МА> doc from DB - I can accessed to any previous revision of the doc.

Please read this post:
http://mail-archives.apache.org/mod_mbox/couchdb-user/200911.mbox/%3Ce8d26ac40911190823ybf3f88fu63c7b4137089540@mail.gmail.com%3E

In other words, you can't use CouchDB's revisions for any purpose, except optimistic locking.


МА> 2011/11/8 Alexander Shorin <kx...@gmail.com>

>> 2011/11/8 Маркевич Александр <ra...@gmail.com>:
>> > Hi CouchDB Team!
>> >
>> > Can I store *.odt/*.ods files into Couch base? Because it have XML-based
>> > format and stability specification (maybe in the future ;).
>> >
>> > If I can't store ODT on this evolution stage of CouchDB - I want to write
>> > or adapt any encoder XML->JSON->CouchDB?
>> >
>> > What are think about it?
>> >
>> > P.S. Sorry of my English.
>> > --
>> > Best regards Alexander Markevych.
>> >
>>
>> Hi Alexander,
>>
>> Why not just store them (*.odt files) as document attachments without
>> any format conversions?
>>
>> --
>> ,,,^..^,,,
>>


Re: CouchDB - Open Document Format mega storage

Posted by Маркевич Александр <ra...@gmail.com>.
Because after conversions we can have many document fields and properties,
wich can save in CouchDB with history of revisions. And if I want to save
doc from DB - I can accessed to any previous revision of the doc.

2011/11/8 Alexander Shorin <kx...@gmail.com>

> 2011/11/8 Маркевич Александр <ra...@gmail.com>:
> > Hi CouchDB Team!
> >
> > Can I store *.odt/*.ods files into Couch base? Because it have XML-based
> > format and stability specification (maybe in the future ;).
> >
> > If I can't store ODT on this evolution stage of CouchDB - I want to write
> > or adapt any encoder XML->JSON->CouchDB?
> >
> > What are think about it?
> >
> > P.S. Sorry of my English.
> > --
> > Best regards Alexander Markevych.
> >
>
> Hi Alexander,
>
> Why not just store them (*.odt files) as document attachments without
> any format conversions?
>
> --
> ,,,^..^,,,
>



-- 
С уважением Александр.

Re: CouchDB - Open Document Format mega storage

Posted by Alexander Shorin <kx...@gmail.com>.
2011/11/8 Маркевич Александр <ra...@gmail.com>:
> Hi CouchDB Team!
>
> Can I store *.odt/*.ods files into Couch base? Because it have XML-based
> format and stability specification (maybe in the future ;).
>
> If I can't store ODT on this evolution stage of CouchDB - I want to write
> or adapt any encoder XML->JSON->CouchDB?
>
> What are think about it?
>
> P.S. Sorry of my English.
> --
> Best regards Alexander Markevych.
>

Hi Alexander,

Why not just store them (*.odt files) as document attachments without
any format conversions?

--
,,,^..^,,,