You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nicolas Peeters <ni...@gmail.com> on 2011/02/23 18:14:30 UTC

Large document design question

Hi CouchDB community,

I have basically a design "best practices" question. We are using CouchDB to
store crawled web content. The document is pretty self explanatory, the id
is the URL and there's a "pages" array that contains all the text from the
web pages.
Potentially, this document can grow very quickly to a large size (> 20 MB).
It seems that we run into issues (
https://issues.apache.org/jira/browse/COUCHDB-893) when creating a view with
objects that are larger than

{
   "_id": "http://www.website.com/",
   "_rev": "1-33c75795126ff81b0125156b88593cc0",

   "pages": [
       {
           "description": "",
           "text": "A lot of text comes here....:",
           "url": "http://www.website.com/",
           "title": "The title of this website /",
           "keywords": "",
       },
       {
           "description": "",
           "text": "A lot of text comes here....:",
           "url": "http://www.website.com/contact/",
           "title": "Contact Page",
           "keywords": "",
       }

            // MANY other pages here
      ],
        "crawlDate": "2011-02-10T12:30:07.416+01:00"
}

This model is not working very well for us. We are thinking about the
following alternatives. We would really appreciate if you could give expert
modelling advice.
- Alternative 1)
Create a "page" document
- Alternative 2)

- Alternative 3)
Subquestion: any particular design reason why this issue is occuring? Any
good workaround (apart from recompilation!). Any ETC when this will be fixed
in a release version?

Thank you,

Nicolas