You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Robert Kowalski <ro...@kowalski.gd> on 2016/07/07 21:44:17 UTC

[DISCUSSION] Limiting the allowed size for documents

Hello list,

Couch 1.x and Couch 2.x will choke as soon as the indexer tries to
process a too large document that was added. The indexing stops and
you have to manually remove the doc. In the best case you built an
automatic process around the process. The automatic process removes
the document instead of the human.

In any case you can't or should not try to submit a similar sized
document, because you hit a limit. The user can't do what they want
(put a big JSON into Couch) and additionally a lot of work is created
to fix the issue.

I was wondering if we can tackle the root cause instead? This would
help to maintain CouchDB in production systems.

I don't want to push this into 2.0. I want to spark a discussion how
we can get rid of small day-to-day operation issues. The goal is to
make CouchDB easier to run and provide a more pleasant experience for
everyone.

Some limits from other databases:

Postgres: 1GB -
https://www.postgresql.org/docs/current/static/datatype-character.html
Dynamo: 400 KB -
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-data-types
Mongo BSON Doc: 16 megabytes - https://docs.mongodb.com/manual/reference/limits/
Couchbase: 20 MB -
http://developer.couchbase.com/documentation/server/current/clustersetup/server-setup.html

Re: [DISCUSSION] Limiting the allowed size for documents

Posted by Joan Touzet <wo...@apache.org>.
----- Original Message -----
> Couch 1.x and Couch 2.x will choke as soon as the indexer tries to
> process a too large document that was added. The indexing stops and
> you have to manually remove the doc. In the best case you built an
> automatic process around the process. The automatic process removes
> the document instead of the human.

I've known about this problem for years, but never did anything.
Thank you for taking the initiative here, Robert.

I think flat out rejecting documents that are too big, with that value set
in the ini file, is the right move if we can't fix the underlying couchjs
issue. 

As for what is "too big," do you have empirical data from Cloudant as to a
recommendation? I've seen documents at Cloudant as small as 24MB cause issues
with couchjs views, personally.

-Joan

Re: [DISCUSSION] Limiting the allowed size for documents

Posted by Garren Smith <ga...@apache.org>.
Great discussion Robert. I agree setting a hard limit is a good idea.

In terms of fixing the indexer to supporting larger files. I would rather
see us set a max size limit. Then make sure the indexer can always handle
that size. Then we can look to do incremental improvements to the indexers
to support larger sizes with each release.

I would think an approach like that would be more helpful to the user.

On Friday, 08 July 2016, Alexander Shorin <kx...@gmail.com> wrote:

> On Fri, Jul 8, 2016 at 12:44 AM, Robert Kowalski <rok@kowalski.gd
> <javascript:;>> wrote:
> > Couch 1.x and Couch 2.x will choke as soon as the indexer tries to
> > process a too large document that was added. The indexing stops and
> > you have to manually remove the doc. In the best case you built an
> > automatic process around the process. The automatic process removes
> > the document instead of the human.
>
> Automatic process of removing stored data in production? You might be
> kidding (:
>
> Limiting the document size here sound like a wrong way to some the
> indexer issue when it cannot handle such documents. Two solutions
> comes on mind:
>
> - Indexer ignores big documents generating enough of loud to help user
> notice the problem;
> - Indexer is fixed to handle big documents;
>
> From user side the second option is the only right because it's my
> data, I put it to database, I trust database in the way it can process
> it, it shouldn't fail me.
>
> What should user do when he hit the limit and cannot store the
> document, because indexer is buggy, but he need this data to be
> processed? He becomes very annoying. Because he need that data as is
> and any attempts to split it into multiple documents may be impossible
> (because we don't have cross documents links and transactions). What's
> the next step for him? Change a database for sure.
>
> I think that the indexer argument is quite weak and strange. More
> strong one is about to cut off possibility of uploading bloat data
> when by design there are some sane boundaries for the stored data. If
> all your documents are avg. 1MiB and your database receives data from
> the world, you would like to explicitly drop anomalies of dozens and
> hundreds MiB because that's not a data you're working with.
>
> See also: https://github.com/apache/couchdb-chttpd/pull/114 - Tony Sun
> made some attempts to add such limit to CouchDB.
>
> There are couple of problems to actually implement such limit in
> predictable and lightweight way because we have awesome _update
> functions (; But I believe that all of them could be overcome.
>
> --
> ,,,^..^,,,
>

Re: [DISCUSSION] Limiting the allowed size for documents

Posted by Alexander Shorin <kx...@gmail.com>.
On Fri, Jul 8, 2016 at 12:44 AM, Robert Kowalski <ro...@kowalski.gd> wrote:
> Couch 1.x and Couch 2.x will choke as soon as the indexer tries to
> process a too large document that was added. The indexing stops and
> you have to manually remove the doc. In the best case you built an
> automatic process around the process. The automatic process removes
> the document instead of the human.

Automatic process of removing stored data in production? You might be kidding (:

Limiting the document size here sound like a wrong way to some the
indexer issue when it cannot handle such documents. Two solutions
comes on mind:

- Indexer ignores big documents generating enough of loud to help user
notice the problem;
- Indexer is fixed to handle big documents;

From user side the second option is the only right because it's my
data, I put it to database, I trust database in the way it can process
it, it shouldn't fail me.

What should user do when he hit the limit and cannot store the
document, because indexer is buggy, but he need this data to be
processed? He becomes very annoying. Because he need that data as is
and any attempts to split it into multiple documents may be impossible
(because we don't have cross documents links and transactions). What's
the next step for him? Change a database for sure.

I think that the indexer argument is quite weak and strange. More
strong one is about to cut off possibility of uploading bloat data
when by design there are some sane boundaries for the stored data. If
all your documents are avg. 1MiB and your database receives data from
the world, you would like to explicitly drop anomalies of dozens and
hundreds MiB because that's not a data you're working with.

See also: https://github.com/apache/couchdb-chttpd/pull/114 - Tony Sun
made some attempts to add such limit to CouchDB.

There are couple of problems to actually implement such limit in
predictable and lightweight way because we have awesome _update
functions (; But I believe that all of them could be overcome.

--
,,,^..^,,,