You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nicolas Peeters <ni...@gmail.com> on 2011/02/23 18:14:30 UTC
Large document design question
Hi CouchDB community,
I have basically a design "best practices" question. We are using CouchDB to
store crawled web content. The document is pretty self explanatory, the id
is the URL and there's a "pages" array that contains all the text from the
web pages.
Potentially, this document can grow very quickly to a large size (> 20 MB).
It seems that we run into issues (
https://issues.apache.org/jira/browse/COUCHDB-893) when creating a view with
objects that are larger than
{
"_id": "http://www.website.com/",
"_rev": "1-33c75795126ff81b0125156b88593cc0",
"pages": [
{
"description": "",
"text": "A lot of text comes here....:",
"url": "http://www.website.com/",
"title": "The title of this website /",
"keywords": "",
},
{
"description": "",
"text": "A lot of text comes here....:",
"url": "http://www.website.com/contact/",
"title": "Contact Page",
"keywords": "",
}
// MANY other pages here
],
"crawlDate": "2011-02-10T12:30:07.416+01:00"
}
This model is not working very well for us. We are thinking about the
following alternatives. We would really appreciate if you could give expert
modelling advice.
- Alternative 1)
Create a "page" document
- Alternative 2)
- Alternative 3)
Subquestion: any particular design reason why this issue is occuring? Any
good workaround (apart from recompilation!). Any ETC when this will be fixed
in a release version?
Thank you,
Nicolas