You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Robert Newson (JIRA)" <ji...@apache.org> on 2009/01/23 15:20:59 UTC

[jira] Created: (COUCHDB-220) Extreme sparseness in couch files

Extreme sparseness in couch files
---------------------------------

                 Key: COUCHDB-220
                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
             Project: CouchDB
          Issue Type: Bug
          Components: Database Core
    Affects Versions: 0.9
         Environment: ubuntu 8.10 64-bit, ext3
            Reporter: Robert Newson



When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;

ls -lh shard0.couch
698M 2009-01-23 13:42 shard0.couch

du -sh shard0.couch
57M	shard0.couch

On filesystems that do not support write holes, this will cause an order of magnitude more I/O.

I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.

Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695902#action_12695902 ] 

Paul Joseph Davis commented on COUCHDB-220:
-------------------------------------------

That's because the call to couch_file:expand/3 isn't happening until the call to couch_stream:write/2 which makes the patch slightly less trivial than I thought, but not that much harder.

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Newson updated COUCHDB-220:
----------------------------------

    Attachment: 220.patch

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, 220.patch, attachment_sparseness.js
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695890#action_12695890 ] 

Robert Newson commented on COUCHDB-220:
---------------------------------------

It appears that the .couch file is extended by 64k every time a document is added (regardless of whether the document is a few hundred bytes).

Chatting with davisp, transcript below;
(
18:27:14) davisp: got that test handy so you can run it after a slight tweak to couchdb?
(18:27:46) rnewson: the sparseness one? yep.
(18:27:53) davisp: rnewson: line 41 in couchdb_stream.erl
(18:28:11) davisp: Try changing that from 16#000010000 to 1
(18:28:39) rnewson: min_alloc, yes?
(18:28:39) davisp: Not sure if that'll break things or not
(18:28:44) rnewson: we'll soon know.
(18:28:49) davisp: But I ran across it when reading
(18:28:55) davisp: rnewson: yep on min alloc
(18:29:12) rnewson: yes, that did it.

...

(18:34:14) rnewson: davisp: I'm glad you did, the difference is dramatic, I'd say this is the cause of the behavior I see.
(18:34:36) davisp: It could be that couch_stream has a bug that's preventing it from using leftover space
(18:34:43) rnewson: davisp: As I said, I actually hit the ext3 max-file-size with this problem.
(18:35:10) davisp: Ie, The 65K is intendeded to be used by multiple documents, but book keeping is saying to constantly create new buffers

...

(19:01:00) davisp: rnewson: It just looks like the buffer state for.... oh dear god
(19:01:11) vmx: davisp: yes i get the idea, and the final output (e.g. in a browsers) seems to be right, but the internal representation seems a bit confusing
(19:01:20) rnewson: davisp: epiphany?
(19:01:49) davisp: rnewson: I wonder if its only holding buffer state for the durating of a single request. Try adding two attachments with the same data

...
(19:10:23) davisp: It looks like a consequence of the necessary code for streaming files that didn't specify a content-length
(19:10:45) davisp: rnewson: Looks like ensure_buffer needs a flag
(19:13:01) davisp: rnewson: My guess is that you'd want to add a flag in the accumulator on the PreAllocSize fold function that says if you have touched the clause that has an unknown length
(19:13:21) davisp: then pass that flag to ensure_buffer and if the flag is true in ensure_buffer you allocate exactly the specified size.
(19:13:30) davisp: instead of the MinSize bit
(19:13:49) rnewson: makes sense.










> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696888#action_12696888 ] 

Paul Joseph Davis commented on COUCHDB-220:
-------------------------------------------

Chris,

First, I'm pretty certain that this bug is only affecting document writes that include an attachment. 

You should check if your erlang loader is getting the proper attachment information all the way down into couch_db:doc_flush_binaries. My first haphazard guess is that its not. My second random guess is you could be seeing the same bug from a different code path. Also, there's another slight tweak to the patch to only go to the 65K allocation when there's a binary of unknown size.

Either way, I'm fairly certain that while changing the min_alloc to a single byte shows that there is a bug, its not the proper fix for the bug.



> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, 220.patch, attachment_sparseness.js, stream.diff
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666548#action_12666548 ] 

Robert Newson commented on COUCHDB-220:
---------------------------------------

According to Wikipedia (http://en.wikipedia.org/wiki/Sparse_file) HFS+ does not support sparse files.

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695940#action_12695940 ] 

Paul Joseph Davis commented on COUCHDB-220:
-------------------------------------------

I think we need a slight change in case someone lies to us:

From:
+    case NextAlloc of
+	0 -> NewSize = lists:max([MinAlloc, size(Bin)]);
+	_ -> NewSize = NextAlloc
+    end,

To:
+    case NextAlloc of
+	0 -> NewSize = lists:max([MinAlloc, size(Bin)]);
+	_ -> NewSize = lists:max([NextAlloc, size(Bin)])
+    end,

Otherwise we could end up writing beyond the allocated space if something gets confused.

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, attachment_sparseness.js
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Newson updated COUCHDB-220:
----------------------------------

    Attachment: attachment_sparseness.js
                220.patch


Don't force 64k inflation, test to demonstrate lack of sparseness.

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, attachment_sparseness.js
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695852#action_12695852 ] 

Robert Newson commented on COUCHDB-220:
---------------------------------------

// Licensed under the Apache License, Version 2.0 (the "License"); you may not
// use this file except in compliance with the License.  You may obtain a copy
// of the License at
//
//   http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
// License for the specific language governing permissions and limitations under
// the License.

couchTests.attachment_sparseness= function(debug) {
  var db = new CouchDB("test_suite_db");
  db.deleteDb();
  db.createDb();
  if (debug) debugger;
  
  for (i = 0; i < 1000; i++) {
      var binAttDoc = {
	  _id: (i).toString(),
	  _attachments:{
	      "foo.txt": {
		  content_type:"text/plain",
		  data: "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
	      }
	  }
      }

      var save_response = db.save(binAttDoc);
      T(save_response.ok);
  }
  
  var before = db.info().disk_size;

  // Compact it.
  T(db.compact().ok);
  T(db.last_req.status == 202);
  // compaction isn't instantaneous, loop until done
  while (db.info().compact_running) {};
  
  var after = db.info().disk_size;

  // Compaction should reduce the database slightly, but not
  // orders of magnitude (unless attachments introduce sparseness)
  T(after > before * 0.1, "database shrunk massively after compaction.");

};

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695898#action_12695898 ] 

Robert Newson commented on COUCHDB-220:
---------------------------------------

I wanted to capture the thread, so that others will have a fuller context.

I don't think we've found the source yet. For example, I commented out the ensure_buffer call and I still get the problem, so I don't think adding the flag you suggested will actually help. 

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695893#action_12695893 ] 

Paul Joseph Davis commented on COUCHDB-220:
-------------------------------------------

You forgot to mention the part where I said that I have absolutely no idea if MinSize is important for some other part of the code ;)

Also, it looks like if we add a check in couch_db:doc_flush_binaries/2 to see if we're not streaming an attachment of unknown length, and then pass that information to couch_stream:ensure_buffer/2 so that couch_stream can decide if it wants to allocate exactly the requested amount or some extra it'd solve the issue. The patch should be relatively trivial, but like I said, I have no idea if there is other important stuff going on there or not.

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Anderson updated COUCHDB-220:
-----------------------------------

    Attachment: stream.diff

I applied Robert's patch (benchmarking before and after with multiple methods) and saw very little change. I then also applied the IRC suggestion (set min_alloc=1 instead of 64kb), and saw substantial speedups as well as much tighter file sizes. 

Summary of my benchmarks:

Bulk_docs posts of 1000 docs (roughly 100bytes each) did not seem to be effected by this patch at all.

However, loading docs in (using a custom erlang loader) where each doc has a 4kb attachment (100 concurrent writers, with docs committed in batches of 10) I saw the big improvements. Here's what I saw after running my loader for 30 seconds:

Before the patch:

db-file-size: 364.8 MB
docs: 5690
bytes/doc = 67229.25 (thats more than 10x wasted space)
doc/sec = 190

After the patch:

db-file-size: 76.15 MB
docs: 13340 (wow, more than twice as many in the same amount of time!)
bytes / doc = 5985.45 (this is a better size for sure)
docs / sec = 445

I've attached the changes I made (stream.diff). I'm hoping Damien can look at it before we apply it to trunk. It seems strange that bypassing min-alloc would make such a big difference, maybe there's a better answer we don't see.


> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, 220.patch, attachment_sparseness.js, stream.diff
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Enda Farrell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696006#action_12696006 ] 

Enda Farrell commented on COUCHDB-220:
--------------------------------------

I have been trying out the operational behaviour of the 0.8x release and noticed something similar to the original posting.

The filesystem type is ext3, but the scenario is different in that there were no attachments involved. When 1.5 million 9k docs are added *in a random fashion* the .couch file ended up at 110 GB. After compaction, this reduced to a more expected 14GB. 

A similar test will be run again soon using the 0.9x release.

/e

* in a random fashion is to mean that the key within the single database is a Perl random number. 4 writers were populating the DB, and some key collissions were expected.

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>         Attachments: 220.patch, attachment_sparseness.js
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (COUCHDB-220) Extreme sparseness in couch files

Posted by "Damien Katz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Damien Katz closed COUCHDB-220.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.10

> Extreme sparseness in couch files
> ---------------------------------
>
>                 Key: COUCHDB-220
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-220
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: ubuntu 8.10 64-bit, ext3
>            Reporter: Robert Newson
>             Fix For: 0.10
>
>         Attachments: 220.patch, 220.patch, attachment_sparseness.js, stream.diff
>
>
> When adding ten thousand documents, each with a small attachment, the discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M	shard0.couch
> On filesystems that do not support write holes, this will cause an order of magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness besides attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.