You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Travis Downs <tr...@gmail.com> on 2015/06/30 02:19:09 UTC

Recover space imposed by 4K minimum document size?

I have an issue where I'm posting single smallish (~500 bytes)
documents to couchdb, yet the DB size is about 10x larger than
expected (i.e., 10x larger than the aggregate size of the documents).

Documents are not deleted or modified after posting.

It seems like what is happening is that every individual (unbatched
write) always takes 4K due to the nature of the append-only algorithm
writing 2 x 2K blocks for each modification as documented here:

http://guide.couchdb.org/draft/btree.html

OK, that's fine. What I don't understand is why the "compact"
operation doesn't recover this space?

I do recover the space if I replicate this DB somewhere else. The full
copy takes about 10x less space. I would expect replicate to be able
to do the same thing in place. Is there some option I'm missing?

Note that I cannot use bulk writes since the documents are posted one
by one by different clients.

Re: Recover space imposed by 4K minimum document size?

Posted by Adam Kocoloski <ko...@apache.org>.
Ah, this one I think I can explain. The compactor in CouchDB 1.x writes documents directly to the new file in batches. If the IDs of those documents are essentially random in nature, the compacted file can end up with a lot of wasted space. By contrast, if the document IDs in the _changes feed are roughly ordered, then the compactor will write a large block of IDs to the same node in the ID btree and then not touch that btree again, resulting in a more compact file. When you increased the buffer size you decreased the number of rewrites that any individual btree node goes through during compaction.

The new compactor in the upcoming 2.0 release eliminates this inefficiency; it generates an optimal file size regardless of ID selection. It does this by maintaining the updated ID tree in a separate .meta file during compaction and then streaming the btree from that .meta file in-order at the end of the compaction.

Travis, I guess it’s possible that you could be bumping into this as well, although 10x sounds extreme.

Adam

> On Jun 30, 2015, at 2:19 AM, Alexander Shorin <kx...@gmail.com> wrote:
> 
> If documents are too small, compaction cannot retrieve all the disk
> space back. See this thread with the similar question:
> http://qnalist.com/questions/5836043/couchdb-database-size
> 
> Question why is still open for me, but at least solution there is.
> --
> ,,,^..^,,,
> 
> 
> On Tue, Jun 30, 2015 at 3:49 AM, Adam Kocoloski <ko...@apache.org> wrote:
>> Database compaction should absolutely recover that space. Can you share a few more details? Are you sure the compaction completes successfully? Cheers,
>> 
>> Adam
>> 
>>> On Jun 29, 2015, at 8:19 PM, Travis Downs <tr...@gmail.com> wrote:
>>> 
>>> I have an issue where I'm posting single smallish (~500 bytes)
>>> documents to couchdb, yet the DB size is about 10x larger than
>>> expected (i.e., 10x larger than the aggregate size of the documents).
>>> 
>>> Documents are not deleted or modified after posting.
>>> 
>>> It seems like what is happening is that every individual (unbatched
>>> write) always takes 4K due to the nature of the append-only algorithm
>>> writing 2 x 2K blocks for each modification as documented here:
>>> 
>>> http://guide.couchdb.org/draft/btree.html
>>> 
>>> OK, that's fine. What I don't understand is why the "compact"
>>> operation doesn't recover this space?
>>> 
>>> I do recover the space if I replicate this DB somewhere else. The full
>>> copy takes about 10x less space. I would expect replicate to be able
>>> to do the same thing in place. Is there some option I'm missing?
>>> 
>>> Note that I cannot use bulk writes since the documents are posted one
>>> by one by different clients.
>>> 
>> 
> 


Re: Recover space imposed by 4K minimum document size?

Posted by Alexander Shorin <kx...@gmail.com>.
If documents are too small, compaction cannot retrieve all the disk
space back. See this thread with the similar question:
http://qnalist.com/questions/5836043/couchdb-database-size

Question why is still open for me, but at least solution there is.
--
,,,^..^,,,


On Tue, Jun 30, 2015 at 3:49 AM, Adam Kocoloski <ko...@apache.org> wrote:
> Database compaction should absolutely recover that space. Can you share a few more details? Are you sure the compaction completes successfully? Cheers,
>
> Adam
>
>> On Jun 29, 2015, at 8:19 PM, Travis Downs <tr...@gmail.com> wrote:
>>
>> I have an issue where I'm posting single smallish (~500 bytes)
>> documents to couchdb, yet the DB size is about 10x larger than
>> expected (i.e., 10x larger than the aggregate size of the documents).
>>
>> Documents are not deleted or modified after posting.
>>
>> It seems like what is happening is that every individual (unbatched
>> write) always takes 4K due to the nature of the append-only algorithm
>> writing 2 x 2K blocks for each modification as documented here:
>>
>> http://guide.couchdb.org/draft/btree.html
>>
>> OK, that's fine. What I don't understand is why the "compact"
>> operation doesn't recover this space?
>>
>> I do recover the space if I replicate this DB somewhere else. The full
>> copy takes about 10x less space. I would expect replicate to be able
>> to do the same thing in place. Is there some option I'm missing?
>>
>> Note that I cannot use bulk writes since the documents are posted one
>> by one by different clients.
>>
>

Re: Recover space imposed by 4K minimum document size?

Posted by Travis Downs <tr...@gmail.com>.
Well that was it. I guess the compaction via the _utils UI simply was never
working. After triggering it with curl, the DB size came down 10x and is
now equal (actually slightly less than, due to compression, I suppose) the
document size.

Thanks!

On Tue, Jun 30, 2015 at 1:35 PM, Adam Kocoloski <ko...@apache.org> wrote:

> Perhaps try triggering the compaction directly from the API with curl?
>
> http://docs.couchdb.org/en/1.6.1/api/database/compact.html
>
> Adam
>
> > On Jun 30, 2015, at 3:45 AM, Travis Downs <tr...@gmail.com>
> wrote:
> >
> > I ran compaction via the button in _utils. I did notice that when I
> > clicked the button, the spinner in the UI never stops, but I did check
> > that compact_running was "false" for the DB in question - so I assumed
> > it finished. I suppose some issue with _utils could instead mean it
> > never started? Is there some way to distinguish the two cases?
> >
> > On Mon, Jun 29, 2015 at 5:49 PM, Adam Kocoloski <ko...@apache.org>
> wrote:
> >> Database compaction should absolutely recover that space. Can you share
> a few more details? Are you sure the compaction completes successfully?
> Cheers,
> >>
> >> Adam
> >>
> >>> On Jun 29, 2015, at 8:19 PM, Travis Downs <tr...@gmail.com>
> wrote:
> >>>
> >>> I have an issue where I'm posting single smallish (~500 bytes)
> >>> documents to couchdb, yet the DB size is about 10x larger than
> >>> expected (i.e., 10x larger than the aggregate size of the documents).
> >>>
> >>> Documents are not deleted or modified after posting.
> >>>
> >>> It seems like what is happening is that every individual (unbatched
> >>> write) always takes 4K due to the nature of the append-only algorithm
> >>> writing 2 x 2K blocks for each modification as documented here:
> >>>
> >>> http://guide.couchdb.org/draft/btree.html
> >>>
> >>> OK, that's fine. What I don't understand is why the "compact"
> >>> operation doesn't recover this space?
> >>>
> >>> I do recover the space if I replicate this DB somewhere else. The full
> >>> copy takes about 10x less space. I would expect replicate to be able
> >>> to do the same thing in place. Is there some option I'm missing?
> >>>
> >>> Note that I cannot use bulk writes since the documents are posted one
> >>> by one by different clients.
> >>>
> >>
> >
>
>

Re: Recover space imposed by 4K minimum document size?

Posted by Adam Kocoloski <ko...@apache.org>.
Perhaps try triggering the compaction directly from the API with curl?

http://docs.couchdb.org/en/1.6.1/api/database/compact.html

Adam

> On Jun 30, 2015, at 3:45 AM, Travis Downs <tr...@gmail.com> wrote:
> 
> I ran compaction via the button in _utils. I did notice that when I
> clicked the button, the spinner in the UI never stops, but I did check
> that compact_running was "false" for the DB in question - so I assumed
> it finished. I suppose some issue with _utils could instead mean it
> never started? Is there some way to distinguish the two cases?
> 
> On Mon, Jun 29, 2015 at 5:49 PM, Adam Kocoloski <ko...@apache.org> wrote:
>> Database compaction should absolutely recover that space. Can you share a few more details? Are you sure the compaction completes successfully? Cheers,
>> 
>> Adam
>> 
>>> On Jun 29, 2015, at 8:19 PM, Travis Downs <tr...@gmail.com> wrote:
>>> 
>>> I have an issue where I'm posting single smallish (~500 bytes)
>>> documents to couchdb, yet the DB size is about 10x larger than
>>> expected (i.e., 10x larger than the aggregate size of the documents).
>>> 
>>> Documents are not deleted or modified after posting.
>>> 
>>> It seems like what is happening is that every individual (unbatched
>>> write) always takes 4K due to the nature of the append-only algorithm
>>> writing 2 x 2K blocks for each modification as documented here:
>>> 
>>> http://guide.couchdb.org/draft/btree.html
>>> 
>>> OK, that's fine. What I don't understand is why the "compact"
>>> operation doesn't recover this space?
>>> 
>>> I do recover the space if I replicate this DB somewhere else. The full
>>> copy takes about 10x less space. I would expect replicate to be able
>>> to do the same thing in place. Is there some option I'm missing?
>>> 
>>> Note that I cannot use bulk writes since the documents are posted one
>>> by one by different clients.
>>> 
>> 
> 


Re: Recover space imposed by 4K minimum document size?

Posted by Travis Downs <tr...@gmail.com>.
I ran compaction via the button in _utils. I did notice that when I
clicked the button, the spinner in the UI never stops, but I did check
that compact_running was "false" for the DB in question - so I assumed
it finished. I suppose some issue with _utils could instead mean it
never started? Is there some way to distinguish the two cases?

On Mon, Jun 29, 2015 at 5:49 PM, Adam Kocoloski <ko...@apache.org> wrote:
> Database compaction should absolutely recover that space. Can you share a few more details? Are you sure the compaction completes successfully? Cheers,
>
> Adam
>
>> On Jun 29, 2015, at 8:19 PM, Travis Downs <tr...@gmail.com> wrote:
>>
>> I have an issue where I'm posting single smallish (~500 bytes)
>> documents to couchdb, yet the DB size is about 10x larger than
>> expected (i.e., 10x larger than the aggregate size of the documents).
>>
>> Documents are not deleted or modified after posting.
>>
>> It seems like what is happening is that every individual (unbatched
>> write) always takes 4K due to the nature of the append-only algorithm
>> writing 2 x 2K blocks for each modification as documented here:
>>
>> http://guide.couchdb.org/draft/btree.html
>>
>> OK, that's fine. What I don't understand is why the "compact"
>> operation doesn't recover this space?
>>
>> I do recover the space if I replicate this DB somewhere else. The full
>> copy takes about 10x less space. I would expect replicate to be able
>> to do the same thing in place. Is there some option I'm missing?
>>
>> Note that I cannot use bulk writes since the documents are posted one
>> by one by different clients.
>>
>

Re: Recover space imposed by 4K minimum document size?

Posted by Adam Kocoloski <ko...@apache.org>.
Database compaction should absolutely recover that space. Can you share a few more details? Are you sure the compaction completes successfully? Cheers,

Adam

> On Jun 29, 2015, at 8:19 PM, Travis Downs <tr...@gmail.com> wrote:
> 
> I have an issue where I'm posting single smallish (~500 bytes)
> documents to couchdb, yet the DB size is about 10x larger than
> expected (i.e., 10x larger than the aggregate size of the documents).
> 
> Documents are not deleted or modified after posting.
> 
> It seems like what is happening is that every individual (unbatched
> write) always takes 4K due to the nature of the append-only algorithm
> writing 2 x 2K blocks for each modification as documented here:
> 
> http://guide.couchdb.org/draft/btree.html
> 
> OK, that's fine. What I don't understand is why the "compact"
> operation doesn't recover this space?
> 
> I do recover the space if I replicate this DB somewhere else. The full
> copy takes about 10x less space. I would expect replicate to be able
> to do the same thing in place. Is there some option I'm missing?
> 
> Note that I cannot use bulk writes since the documents are posted one
> by one by different clients.
>