You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Mike Kimber <mk...@kana.com> on 2012/04/10 10:19:34 UTC

BigCouch - Replication failing with Cannot Allocate memory

I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;

eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").

My set-up is:

Standalone couchdb 1.1.1 running on Centos 5.7

3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)

[httpd]
bind_address = XXX.XX.X.XX

[cluster]
; number of shards for a new database
q = 9
; number of copies of each shard
n = 1

[couchdb]
database_dir = /other/bigcouch/database
view_index_dir = /other/bigcouch/view

The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.

Any help would be greatly appreciated.

Mike

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

Ah, thought there had to be some reason; the response on the couch mailing list is always excellent.

Mike 

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org] 
Sent: 12 April 2012 17:24
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

Unfortunately your request for help coincided with the two day CouchDB
Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
ways to get bigcouch support, but we happily answer queries here too,
when not at the Model UN of CouchDB. :D

B.

On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>
> Mike
>
> -----Original Message-----
> From: Mike Kimber [mailto:mkimber@kana.com]
> Sent: 10 April 2012 09:20
> To: user@couchdb.apache.org
> Subject: BigCouch - Replication failing with Cannot Allocate memory
>
> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>
> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>
> My set-up is:
>
> Standalone couchdb 1.1.1 running on Centos 5.7
>
> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>
> [httpd]
> bind_address = XXX.XX.X.XX
>
> [cluster]
> ; number of shards for a new database
> q = 9
> ; number of copies of each shard
> n = 1
>
> [couchdb]
> database_dir = /other/bigcouch/database
> view_index_dir = /other/bigcouch/view
>
> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>
> Any help would be greatly appreciated.
>
> Mike
>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <ro...@gmail.com>.

How much ram did beam.smp have at the time it hit eheap_alloc failure
and how did it try to acquire?

B.

On 13 April 2012 12:27, Mike Kimber <mk...@kana.com> wrote:
> I upped the memory to 6GB on each of the nodes and got exactly the same issue in the same time frame i.e. the increased RAM did not seem to by me any additional time.
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 12 April 2012 19:34
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> 2GB total ram does sound tight. I can only compare to high volume
> production clusters which have much more ram than this. Given that
> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
> rest one? To couchjs processes, by chance? If so, you can reduce the
> maximum size of that pool in config, I think the default is 50.
>
> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>> Ok, I have 3 nodes all load balanced with HAproxy:
>>
>> Centos 5.8 (Virtualised)
>> 2 Cores
>> 2GB RAM
>>
>> I'm trying to replicate about 75K documents which total 6GB when compacted (0n Couchdb 1.2 which has compression turned on). I'm told they are fairly large documents.
>>
>> When it goes pear shaped Vsmstat starts using a lot of memory:
>>
>> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1  6  2 91  0
>>  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1  5  9 85  0
>>  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1  7  1 91  0
>>  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1 10  4 85  0
>>  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13  7 33 47  0
>>  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17  8 49 26  0
>>  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25  9 61  4  0
>>  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8  4 49 40  0
>>  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4  2 50 44  0
>>  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9  2 50 40  0
>>  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22 20 36 23  0
>>  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3 22  0 75  0
>>  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5 19 17 59  0
>>  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3 10 29 58  0
>>  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2  9 32 57  0
>>  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2  7 30 61  0
>>  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2  7  6 84  0
>>  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1  6 11 83  0
>>  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1  8 16 75  0
>>
>> It only ever takes out one node at a time and the other nodes seem to be doing very little while the one node is running out of memory.
>>
>> If I kick it off again it processed some more and then spikes the memory and fails
>>
>> Thanks
>>
>> Mike
>>
>> PS: hope you enjoyed you couchdb get together!
>>
>> -----Original Message-----
>> From: Robert Newson [mailto:rnewson@apache.org]
>> Sent: 12 April 2012 17:28
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> What kind of load were you putting the machine on?
>>
>> On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>> Could you show your vm.args file?
>>>
>>> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>>> Unfortunately your request for help coincided with the two day CouchDB
>>>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>>> ways to get bigcouch support, but we happily answer queries here too,
>>>> when not at the Model UN of CouchDB. :D
>>>>
>>>> B.
>>>>
>>>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>>>>
>>>>> Mike
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>>> Sent: 10 April 2012 09:20
>>>>> To: user@couchdb.apache.org
>>>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>>>
>>>>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>>>>
>>>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>>>>
>>>>> My set-up is:
>>>>>
>>>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>>>
>>>>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>>>
>>>>> [httpd]
>>>>> bind_address = XXX.XX.X.XX
>>>>>
>>>>> [cluster]
>>>>> ; number of shards for a new database
>>>>> q = 9
>>>>> ; number of copies of each shard
>>>>> n = 1
>>>>>
>>>>> [couchdb]
>>>>> database_dir = /other/bigcouch/database
>>>>> view_index_dir = /other/bigcouch/view
>>>>>
>>>>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>>>>
>>>>> Any help would be greatly appreciated.
>>>>>
>>>>> Mike
>>>>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

I figured that, but when you asked for the version of them, I made me question myself!

So any ideas where I go from here?

Mike

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org]
Sent: 16 April 2012 15:35
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

Bigcouch is built as an erlang release, so it includes all the bits of
erlang needed to run. As part of the packaging, I also packaged
spidermonkey, which should have been pulled in automatically.

B.

On 16 April 2012 15:32, Mike Kimber <mk...@kana.com> wrote:
> I used the instructions on http://bigcouch.cloudant.com/use  for RHEL/centos so used yum to install. Which installed bigcouch-0.4.0-1.
>
> I did not install Erlang and spidermonkey as the above seemed to do it for me (I hope or I'm going to look v stupid and it would be a miracle its running at all!)
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 14 April 2012 14:35
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> Mike,
>
> Thanks for the logs, they do look clean, as you said.
>
> It was remiss of me not to ask for version numbers. Can you tell me
> which bigcouch version, erlang version, spidermonkey version you have
> here?
>
> B.
>
> On 13 April 2012 21:18, Mike Kimber <mk...@kana.com> wrote:
>> A clean log file (i.e. stop bigcouch, delete log file, restart bigcouch, run replication, wait for failure, stop bigcouch) from the node that failed this time around can be found at:
>>
>> http://pastebin.com/embed_js.php?i=s52rYwwy
>>
>> Mike
>>
>> -----Original Message-----
>> From: Robert Newson [mailto:rnewson@apache.org]
>> Sent: 13 April 2012 19:28
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> Mike,
>>
>> Do you have couch.logs from around that time?
>>
>> B.
>>
>> On 13 April 2012 17:54, Mike Kimber <mk...@kana.com> wrote:
>>> Sorry forgot to say that I have already up'd it to N=3 and still get the same issue.
>>>
>>> I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:
>>>
>>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>>  3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
>>>  2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
>>>  2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
>>>  2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
>>>  2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
>>>  1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
>>>  1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
>>>  2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
>>>  3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
>>>  2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
>>>  1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
>>>  1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
>>>  2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
>>>  2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
>>>  2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
>>>  2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
>>>  2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
>>> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
>>>  0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
>>>  0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
>>>  0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
>>>  0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
>>>  0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
>>>  0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0
>>>
>>> I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)
>>>
>>> Mike
>>>
>>> -----Original Message-----
>>> From: Robert Newson [mailto:rnewson@apache.org]
>>> Sent: 13 April 2012 17:31
>>> To: user@couchdb.apache.org
>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>
>>> I should note that bigcouch is tested much more often with N=3.
>>> Perhaps there's something about N=1 that exasperates the issue. For a
>>> test, could you try with N=3?
>>>
>>> B.
>>>
>>> On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
>>>> "1. Try to replicate the database in another CouchDB."
>>>>
>>>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>>>>
>>>> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: CGS [mailto:cgsmcmlxxv@gmail.com]
>>>> Sent: 13 April 2012 15:01
>>>> To: user@couchdb.apache.org
>>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>>
>>>> If you say so, Robert, I won't argue with you on that. I meant no offense,
>>>> so, please, accept my apologies if I crossed the line. It's all your's from
>>>> now on.
>>>>
>>>> Mike, please, ignore my suggestion. Sorry for interfering.
>>>>
>>>> Good luck!
>>>>
>>>> CGS
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>>>>
>>>>> I think you should point out that "My idea behind these tests is that
>>>>> it may be that your database may be
>>>>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>>> you get is just garbage at a certain document. " is based on no
>>>>> evidence. Nor, if it were true, would it necessarily explain the
>>>>> observed behavior either.
>>>>>
>>>>> It would be useful if we could all stick to asserting only things we
>>>>> know to be true or have reasonable grounds to believe are true.
>>>>> Unfounded speculation, though offered sincerely, is not helpful on a
>>>>> mailing list intended to provide assistance.
>>>>>
>>>>> Thanks,
>>>>> B.
>>>>>
>>>>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>>>>> > Hi Mike,
>>>>> >
>>>>> > I haven't used BigCouch by now and that's why I haven't said anything by
>>>>> > now. Still, giving a thought of what may occur there, I propose few tests
>>>>> > if you have time:
>>>>> > 1. Try to replicate the database in another CouchDB.
>>>>> > 2. If 1 passes, try to replicate to only one node at the time.
>>>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>>>>> > replication (for sure it will fail at all 3 nodes at the time).
>>>>> >
>>>>> > My idea behind these tests is that it may be that your database may be
>>>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>>> > you get is just garbage at a certain document. That's why I proposed the
>>>>> > first test. The second test is to see if any of the nodes has a problem
>>>>> in
>>>>> > configuration (or if there is any incompatibility in between your CouchDB
>>>>> > and BigCouch in manipulating your docs). Finally, the third test is to
>>>>> see
>>>>> > if server/node resources limit the number of replications (and at how
>>>>> many
>>>>> > it starts to fail).
>>>>> >
>>>>> > Can you also check the size of the shards at tests 2 and 3?
>>>>> >
>>>>> > If you consider that these tests are irrelevant, please, ignore my
>>>>> > suggestion.
>>>>> >
>>>>> > CGS
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>>>>> >
>>>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>>>>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>>>>> me
>>>>> >> any additional time.
>>>>> >>
>>>>> >> Mike
>>>>> >>
>>>>> >> -----Original Message-----
>>>>> >> From: Robert Newson [mailto:rnewson@apache.org]
>>>>> >> Sent: 12 April 2012 19:34
>>>>> >> To: user@couchdb.apache.org
>>>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>>> >>
>>>>> >> 2GB total ram does sound tight. I can only compare to high volume
>>>>> >> production clusters which have much more ram than this. Given that
>>>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>>>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>>>>> >> maximum size of that pool in config, I think the default is 50.
>>>>> >>
>>>>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>>>>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>>>>> >> >
>>>>> >> > Centos 5.8 (Virtualised)
>>>>> >> > 2 Cores
>>>>> >> > 2GB RAM
>>>>> >> >
>>>>> >> > I'm trying to replicate about 75K documents which total 6GB when
>>>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>>>>> they
>>>>> >> are fairly large documents.
>>>>> >> >
>>>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>>>>> >> >
>>>>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>>>>> >> -----cpu------
>>>>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>>>>> sy
>>>>> >> id wa st
>>>>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>>>>  6
>>>>> >>  2 91  0
>>>>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>>>>  5
>>>>> >>  9 85  0
>>>>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>>>>  7
>>>>> >>  1 91  0
>>>>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>>>>> 10
>>>>> >>  4 85  0
>>>>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>>>>  7
>>>>> >> 33 47  0
>>>>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>>>>  8
>>>>> >> 49 26  0
>>>>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>>>>  9
>>>>> >> 61  4  0
>>>>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>>>>  4
>>>>> >> 49 40  0
>>>>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>>>>  2
>>>>> >> 50 44  0
>>>>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>>>>  2
>>>>> >> 50 40  0
>>>>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>>>>> 20
>>>>> >> 36 23  0
>>>>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>>>>> 22
>>>>> >>  0 75  0
>>>>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>>>>> >> 19 17 59  0
>>>>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>>>>> 10
>>>>> >> 29 58  0
>>>>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>>>>  9
>>>>> >> 32 57  0
>>>>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>>>>  7
>>>>> >> 30 61  0
>>>>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>>>>  7
>>>>> >>  6 84  0
>>>>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>>>>  6
>>>>> >> 11 83  0
>>>>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>>>>  8
>>>>> >> 16 75  0
>>>>> >> >
>>>>> >> > It only ever takes out one node at a time and the other nodes seem to
>>>>> be
>>>>> >> doing very little while the one node is running out of memory.
>>>>> >> >
>>>>> >> > If I kick it off again it processed some more and then spikes the
>>>>> memory
>>>>> >> and fails
>>>>> >> >
>>>>> >> > Thanks
>>>>> >> >
>>>>> >> > Mike
>>>>> >> >
>>>>> >> > PS: hope you enjoyed you couchdb get together!
>>>>> >> >
>>>>> >> > -----Original Message-----
>>>>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>>>>> >> > Sent: 12 April 2012 17:28
>>>>> >> > To: user@couchdb.apache.org
>>>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>>>>> memory
>>>>> >> >
>>>>> >> > What kind of load were you putting the machine on?
>>>>> >> >
>>>>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>>>> >> >> Could you show your vm.args file?
>>>>> >> >>
>>>>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>>>> >> >>> Unfortunately your request for help coincided with the two day
>>>>> CouchDB
>>>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>>>> >> >>> ways to get bigcouch support, but we happily answer queries here
>>>>> too,
>>>>> >> >>> when not at the Model UN of CouchDB. :D
>>>>> >> >>>
>>>>> >> >>> B.
>>>>> >> >>>
>>>>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>>> >> >>>> Looks like this isn't the right place based on the responses so
>>>>> far.
>>>>> >> Shame I hoped this was going to help solve our index/view rebuild times
>>>>> etc.
>>>>> >> >>>>
>>>>> >> >>>> Mike
>>>>> >> >>>>
>>>>> >> >>>> -----Original Message-----
>>>>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>>> >> >>>> Sent: 10 April 2012 09:20
>>>>> >> >>>> To: user@couchdb.apache.org
>>>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>>> >> >>>>
>>>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>>>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>>>>> >> cluster? If this is not the correct place please point me in the right
>>>>> >> direction if it is then any one have any ideas why I keep getting the
>>>>> >> following error message when I kick of a replication;
>>>>> >> >>>>
>>>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>>>>> >> "heap").
>>>>> >> >>>>
>>>>> >> >>>> My set-up is:
>>>>> >> >>>>
>>>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>>> >> >>>>
>>>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>>>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>>> >> >>>>
>>>>> >> >>>> [httpd]
>>>>> >> >>>> bind_address = XXX.XX.X.XX
>>>>> >> >>>>
>>>>> >> >>>> [cluster]
>>>>> >> >>>> ; number of shards for a new database
>>>>> >> >>>> q = 9
>>>>> >> >>>> ; number of copies of each shard
>>>>> >> >>>> n = 1
>>>>> >> >>>>
>>>>> >> >>>> [couchdb]
>>>>> >> >>>> database_dir = /other/bigcouch/database
>>>>> >> >>>> view_index_dir = /other/bigcouch/view
>>>>> >> >>>>
>>>>> >> >>>> The error is always generate on the third node in the cluster and
>>>>> the
>>>>> >> server basically max's out on memory before hand. The other nodes seem
>>>>> to
>>>>> >> be doing very little, but are getting data i.e. the shard sizes are
>>>>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>>>>> >> interested in resilience.
>>>>> >> >>>>
>>>>> >> >>>> Any help would be greatly appreciated.
>>>>> >> >>>>
>>>>> >> >>>> Mike
>>>>> >> >>>>
>>>>> >>
>>>>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

Bigcouch is built as an erlang release, so it includes all the bits of
erlang needed to run. As part of the packaging, I also packaged
spidermonkey, which should have been pulled in automatically.

B.

On 16 April 2012 15:32, Mike Kimber <mk...@kana.com> wrote:
> I used the instructions on http://bigcouch.cloudant.com/use  for RHEL/centos so used yum to install. Which installed bigcouch-0.4.0-1.
>
> I did not install Erlang and spidermonkey as the above seemed to do it for me (I hope or I'm going to look v stupid and it would be a miracle its running at all!)
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 14 April 2012 14:35
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> Mike,
>
> Thanks for the logs, they do look clean, as you said.
>
> It was remiss of me not to ask for version numbers. Can you tell me
> which bigcouch version, erlang version, spidermonkey version you have
> here?
>
> B.
>
> On 13 April 2012 21:18, Mike Kimber <mk...@kana.com> wrote:
>> A clean log file (i.e. stop bigcouch, delete log file, restart bigcouch, run replication, wait for failure, stop bigcouch) from the node that failed this time around can be found at:
>>
>> http://pastebin.com/embed_js.php?i=s52rYwwy
>>
>> Mike
>>
>> -----Original Message-----
>> From: Robert Newson [mailto:rnewson@apache.org]
>> Sent: 13 April 2012 19:28
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> Mike,
>>
>> Do you have couch.logs from around that time?
>>
>> B.
>>
>> On 13 April 2012 17:54, Mike Kimber <mk...@kana.com> wrote:
>>> Sorry forgot to say that I have already up'd it to N=3 and still get the same issue.
>>>
>>> I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:
>>>
>>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>>  3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
>>>  2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
>>>  2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
>>>  2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
>>>  2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
>>>  1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
>>>  1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
>>>  2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
>>>  3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
>>>  2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
>>>  1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
>>>  1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
>>>  2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
>>>  2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
>>>  2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
>>>  2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
>>>  2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
>>> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
>>>  0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
>>>  0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
>>>  0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
>>>  0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
>>>  0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
>>>  0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0
>>>
>>> I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)
>>>
>>> Mike
>>>
>>> -----Original Message-----
>>> From: Robert Newson [mailto:rnewson@apache.org]
>>> Sent: 13 April 2012 17:31
>>> To: user@couchdb.apache.org
>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>
>>> I should note that bigcouch is tested much more often with N=3.
>>> Perhaps there's something about N=1 that exasperates the issue. For a
>>> test, could you try with N=3?
>>>
>>> B.
>>>
>>> On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
>>>> "1. Try to replicate the database in another CouchDB."
>>>>
>>>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>>>>
>>>> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: CGS [mailto:cgsmcmlxxv@gmail.com]
>>>> Sent: 13 April 2012 15:01
>>>> To: user@couchdb.apache.org
>>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>>
>>>> If you say so, Robert, I won't argue with you on that. I meant no offense,
>>>> so, please, accept my apologies if I crossed the line. It's all your's from
>>>> now on.
>>>>
>>>> Mike, please, ignore my suggestion. Sorry for interfering.
>>>>
>>>> Good luck!
>>>>
>>>> CGS
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>>>>
>>>>> I think you should point out that "My idea behind these tests is that
>>>>> it may be that your database may be
>>>>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>>> you get is just garbage at a certain document. " is based on no
>>>>> evidence. Nor, if it were true, would it necessarily explain the
>>>>> observed behavior either.
>>>>>
>>>>> It would be useful if we could all stick to asserting only things we
>>>>> know to be true or have reasonable grounds to believe are true.
>>>>> Unfounded speculation, though offered sincerely, is not helpful on a
>>>>> mailing list intended to provide assistance.
>>>>>
>>>>> Thanks,
>>>>> B.
>>>>>
>>>>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>>>>> > Hi Mike,
>>>>> >
>>>>> > I haven't used BigCouch by now and that's why I haven't said anything by
>>>>> > now. Still, giving a thought of what may occur there, I propose few tests
>>>>> > if you have time:
>>>>> > 1. Try to replicate the database in another CouchDB.
>>>>> > 2. If 1 passes, try to replicate to only one node at the time.
>>>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>>>>> > replication (for sure it will fail at all 3 nodes at the time).
>>>>> >
>>>>> > My idea behind these tests is that it may be that your database may be
>>>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>>> > you get is just garbage at a certain document. That's why I proposed the
>>>>> > first test. The second test is to see if any of the nodes has a problem
>>>>> in
>>>>> > configuration (or if there is any incompatibility in between your CouchDB
>>>>> > and BigCouch in manipulating your docs). Finally, the third test is to
>>>>> see
>>>>> > if server/node resources limit the number of replications (and at how
>>>>> many
>>>>> > it starts to fail).
>>>>> >
>>>>> > Can you also check the size of the shards at tests 2 and 3?
>>>>> >
>>>>> > If you consider that these tests are irrelevant, please, ignore my
>>>>> > suggestion.
>>>>> >
>>>>> > CGS
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>>>>> >
>>>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>>>>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>>>>> me
>>>>> >> any additional time.
>>>>> >>
>>>>> >> Mike
>>>>> >>
>>>>> >> -----Original Message-----
>>>>> >> From: Robert Newson [mailto:rnewson@apache.org]
>>>>> >> Sent: 12 April 2012 19:34
>>>>> >> To: user@couchdb.apache.org
>>>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>>> >>
>>>>> >> 2GB total ram does sound tight. I can only compare to high volume
>>>>> >> production clusters which have much more ram than this. Given that
>>>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>>>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>>>>> >> maximum size of that pool in config, I think the default is 50.
>>>>> >>
>>>>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>>>>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>>>>> >> >
>>>>> >> > Centos 5.8 (Virtualised)
>>>>> >> > 2 Cores
>>>>> >> > 2GB RAM
>>>>> >> >
>>>>> >> > I'm trying to replicate about 75K documents which total 6GB when
>>>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>>>>> they
>>>>> >> are fairly large documents.
>>>>> >> >
>>>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>>>>> >> >
>>>>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>>>>> >> -----cpu------
>>>>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>>>>> sy
>>>>> >> id wa st
>>>>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>>>>  6
>>>>> >>  2 91  0
>>>>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>>>>  5
>>>>> >>  9 85  0
>>>>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>>>>  7
>>>>> >>  1 91  0
>>>>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>>>>> 10
>>>>> >>  4 85  0
>>>>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>>>>  7
>>>>> >> 33 47  0
>>>>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>>>>  8
>>>>> >> 49 26  0
>>>>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>>>>  9
>>>>> >> 61  4  0
>>>>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>>>>  4
>>>>> >> 49 40  0
>>>>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>>>>  2
>>>>> >> 50 44  0
>>>>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>>>>  2
>>>>> >> 50 40  0
>>>>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>>>>> 20
>>>>> >> 36 23  0
>>>>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>>>>> 22
>>>>> >>  0 75  0
>>>>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>>>>> >> 19 17 59  0
>>>>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>>>>> 10
>>>>> >> 29 58  0
>>>>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>>>>  9
>>>>> >> 32 57  0
>>>>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>>>>  7
>>>>> >> 30 61  0
>>>>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>>>>  7
>>>>> >>  6 84  0
>>>>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>>>>  6
>>>>> >> 11 83  0
>>>>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>>>>  8
>>>>> >> 16 75  0
>>>>> >> >
>>>>> >> > It only ever takes out one node at a time and the other nodes seem to
>>>>> be
>>>>> >> doing very little while the one node is running out of memory.
>>>>> >> >
>>>>> >> > If I kick it off again it processed some more and then spikes the
>>>>> memory
>>>>> >> and fails
>>>>> >> >
>>>>> >> > Thanks
>>>>> >> >
>>>>> >> > Mike
>>>>> >> >
>>>>> >> > PS: hope you enjoyed you couchdb get together!
>>>>> >> >
>>>>> >> > -----Original Message-----
>>>>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>>>>> >> > Sent: 12 April 2012 17:28
>>>>> >> > To: user@couchdb.apache.org
>>>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>>>>> memory
>>>>> >> >
>>>>> >> > What kind of load were you putting the machine on?
>>>>> >> >
>>>>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>>>> >> >> Could you show your vm.args file?
>>>>> >> >>
>>>>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>>>> >> >>> Unfortunately your request for help coincided with the two day
>>>>> CouchDB
>>>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>>>> >> >>> ways to get bigcouch support, but we happily answer queries here
>>>>> too,
>>>>> >> >>> when not at the Model UN of CouchDB. :D
>>>>> >> >>>
>>>>> >> >>> B.
>>>>> >> >>>
>>>>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>>> >> >>>> Looks like this isn't the right place based on the responses so
>>>>> far.
>>>>> >> Shame I hoped this was going to help solve our index/view rebuild times
>>>>> etc.
>>>>> >> >>>>
>>>>> >> >>>> Mike
>>>>> >> >>>>
>>>>> >> >>>> -----Original Message-----
>>>>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>>> >> >>>> Sent: 10 April 2012 09:20
>>>>> >> >>>> To: user@couchdb.apache.org
>>>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>>> >> >>>>
>>>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>>>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>>>>> >> cluster? If this is not the correct place please point me in the right
>>>>> >> direction if it is then any one have any ideas why I keep getting the
>>>>> >> following error message when I kick of a replication;
>>>>> >> >>>>
>>>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>>>>> >> "heap").
>>>>> >> >>>>
>>>>> >> >>>> My set-up is:
>>>>> >> >>>>
>>>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>>> >> >>>>
>>>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>>>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>>> >> >>>>
>>>>> >> >>>> [httpd]
>>>>> >> >>>> bind_address = XXX.XX.X.XX
>>>>> >> >>>>
>>>>> >> >>>> [cluster]
>>>>> >> >>>> ; number of shards for a new database
>>>>> >> >>>> q = 9
>>>>> >> >>>> ; number of copies of each shard
>>>>> >> >>>> n = 1
>>>>> >> >>>>
>>>>> >> >>>> [couchdb]
>>>>> >> >>>> database_dir = /other/bigcouch/database
>>>>> >> >>>> view_index_dir = /other/bigcouch/view
>>>>> >> >>>>
>>>>> >> >>>> The error is always generate on the third node in the cluster and
>>>>> the
>>>>> >> server basically max's out on memory before hand. The other nodes seem
>>>>> to
>>>>> >> be doing very little, but are getting data i.e. the shard sizes are
>>>>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>>>>> >> interested in resilience.
>>>>> >> >>>>
>>>>> >> >>>> Any help would be greatly appreciated.
>>>>> >> >>>>
>>>>> >> >>>> Mike
>>>>> >> >>>>
>>>>> >>
>>>>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

I used the instructions on http://bigcouch.cloudant.com/use  for RHEL/centos so used yum to install. Which installed bigcouch-0.4.0-1.

I did not install Erlang and spidermonkey as the above seemed to do it for me (I hope or I'm going to look v stupid and it would be a miracle its running at all!)

Mike

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org]
Sent: 14 April 2012 14:35
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

Mike,

Thanks for the logs, they do look clean, as you said.

It was remiss of me not to ask for version numbers. Can you tell me
which bigcouch version, erlang version, spidermonkey version you have
here?

B.

On 13 April 2012 21:18, Mike Kimber <mk...@kana.com> wrote:
> A clean log file (i.e. stop bigcouch, delete log file, restart bigcouch, run replication, wait for failure, stop bigcouch) from the node that failed this time around can be found at:
>
> http://pastebin.com/embed_js.php?i=s52rYwwy
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 13 April 2012 19:28
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> Mike,
>
> Do you have couch.logs from around that time?
>
> B.
>
> On 13 April 2012 17:54, Mike Kimber <mk...@kana.com> wrote:
>> Sorry forgot to say that I have already up'd it to N=3 and still get the same issue.
>>
>> I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:
>>
>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>  3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
>>  2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
>>  2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
>>  2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
>>  2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
>>  1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
>>  1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
>>  2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
>>  3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
>>  2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
>>  1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
>>  1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
>>  2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
>>  2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
>>  2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
>>  2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
>>  2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
>> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
>>  0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
>>  0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
>>  0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
>>  0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
>>  0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
>>  0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0
>>
>> I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)
>>
>> Mike
>>
>> -----Original Message-----
>> From: Robert Newson [mailto:rnewson@apache.org]
>> Sent: 13 April 2012 17:31
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> I should note that bigcouch is tested much more often with N=3.
>> Perhaps there's something about N=1 that exasperates the issue. For a
>> test, could you try with N=3?
>>
>> B.
>>
>> On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
>>> "1. Try to replicate the database in another CouchDB."
>>>
>>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>>>
>>> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>>>
>>> Mike
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: CGS [mailto:cgsmcmlxxv@gmail.com]
>>> Sent: 13 April 2012 15:01
>>> To: user@couchdb.apache.org
>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>
>>> If you say so, Robert, I won't argue with you on that. I meant no offense,
>>> so, please, accept my apologies if I crossed the line. It's all your's from
>>> now on.
>>>
>>> Mike, please, ignore my suggestion. Sorry for interfering.
>>>
>>> Good luck!
>>>
>>> CGS
>>>
>>>
>>>
>>>
>>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>>>
>>>> I think you should point out that "My idea behind these tests is that
>>>> it may be that your database may be
>>>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>> you get is just garbage at a certain document. " is based on no
>>>> evidence. Nor, if it were true, would it necessarily explain the
>>>> observed behavior either.
>>>>
>>>> It would be useful if we could all stick to asserting only things we
>>>> know to be true or have reasonable grounds to believe are true.
>>>> Unfounded speculation, though offered sincerely, is not helpful on a
>>>> mailing list intended to provide assistance.
>>>>
>>>> Thanks,
>>>> B.
>>>>
>>>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>>>> > Hi Mike,
>>>> >
>>>> > I haven't used BigCouch by now and that's why I haven't said anything by
>>>> > now. Still, giving a thought of what may occur there, I propose few tests
>>>> > if you have time:
>>>> > 1. Try to replicate the database in another CouchDB.
>>>> > 2. If 1 passes, try to replicate to only one node at the time.
>>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>>>> > replication (for sure it will fail at all 3 nodes at the time).
>>>> >
>>>> > My idea behind these tests is that it may be that your database may be
>>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>> > you get is just garbage at a certain document. That's why I proposed the
>>>> > first test. The second test is to see if any of the nodes has a problem
>>>> in
>>>> > configuration (or if there is any incompatibility in between your CouchDB
>>>> > and BigCouch in manipulating your docs). Finally, the third test is to
>>>> see
>>>> > if server/node resources limit the number of replications (and at how
>>>> many
>>>> > it starts to fail).
>>>> >
>>>> > Can you also check the size of the shards at tests 2 and 3?
>>>> >
>>>> > If you consider that these tests are irrelevant, please, ignore my
>>>> > suggestion.
>>>> >
>>>> > CGS
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>>>> >
>>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>>>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>>>> me
>>>> >> any additional time.
>>>> >>
>>>> >> Mike
>>>> >>
>>>> >> -----Original Message-----
>>>> >> From: Robert Newson [mailto:rnewson@apache.org]
>>>> >> Sent: 12 April 2012 19:34
>>>> >> To: user@couchdb.apache.org
>>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>> >>
>>>> >> 2GB total ram does sound tight. I can only compare to high volume
>>>> >> production clusters which have much more ram than this. Given that
>>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>>>> >> maximum size of that pool in config, I think the default is 50.
>>>> >>
>>>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>>>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>>>> >> >
>>>> >> > Centos 5.8 (Virtualised)
>>>> >> > 2 Cores
>>>> >> > 2GB RAM
>>>> >> >
>>>> >> > I'm trying to replicate about 75K documents which total 6GB when
>>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>>>> they
>>>> >> are fairly large documents.
>>>> >> >
>>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>>>> >> >
>>>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>>>> >> -----cpu------
>>>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>>>> sy
>>>> >> id wa st
>>>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>>>  6
>>>> >>  2 91  0
>>>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>>>  5
>>>> >>  9 85  0
>>>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>>>  7
>>>> >>  1 91  0
>>>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>>>> 10
>>>> >>  4 85  0
>>>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>>>  7
>>>> >> 33 47  0
>>>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>>>  8
>>>> >> 49 26  0
>>>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>>>  9
>>>> >> 61  4  0
>>>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>>>  4
>>>> >> 49 40  0
>>>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>>>  2
>>>> >> 50 44  0
>>>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>>>  2
>>>> >> 50 40  0
>>>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>>>> 20
>>>> >> 36 23  0
>>>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>>>> 22
>>>> >>  0 75  0
>>>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>>>> >> 19 17 59  0
>>>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>>>> 10
>>>> >> 29 58  0
>>>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>>>  9
>>>> >> 32 57  0
>>>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>>>  7
>>>> >> 30 61  0
>>>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>>>  7
>>>> >>  6 84  0
>>>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>>>  6
>>>> >> 11 83  0
>>>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>>>  8
>>>> >> 16 75  0
>>>> >> >
>>>> >> > It only ever takes out one node at a time and the other nodes seem to
>>>> be
>>>> >> doing very little while the one node is running out of memory.
>>>> >> >
>>>> >> > If I kick it off again it processed some more and then spikes the
>>>> memory
>>>> >> and fails
>>>> >> >
>>>> >> > Thanks
>>>> >> >
>>>> >> > Mike
>>>> >> >
>>>> >> > PS: hope you enjoyed you couchdb get together!
>>>> >> >
>>>> >> > -----Original Message-----
>>>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>>>> >> > Sent: 12 April 2012 17:28
>>>> >> > To: user@couchdb.apache.org
>>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>>>> memory
>>>> >> >
>>>> >> > What kind of load were you putting the machine on?
>>>> >> >
>>>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>>> >> >> Could you show your vm.args file?
>>>> >> >>
>>>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>>> >> >>> Unfortunately your request for help coincided with the two day
>>>> CouchDB
>>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>>> >> >>> ways to get bigcouch support, but we happily answer queries here
>>>> too,
>>>> >> >>> when not at the Model UN of CouchDB. :D
>>>> >> >>>
>>>> >> >>> B.
>>>> >> >>>
>>>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>> >> >>>> Looks like this isn't the right place based on the responses so
>>>> far.
>>>> >> Shame I hoped this was going to help solve our index/view rebuild times
>>>> etc.
>>>> >> >>>>
>>>> >> >>>> Mike
>>>> >> >>>>
>>>> >> >>>> -----Original Message-----
>>>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>> >> >>>> Sent: 10 April 2012 09:20
>>>> >> >>>> To: user@couchdb.apache.org
>>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>> >> >>>>
>>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>>>> >> cluster? If this is not the correct place please point me in the right
>>>> >> direction if it is then any one have any ideas why I keep getting the
>>>> >> following error message when I kick of a replication;
>>>> >> >>>>
>>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>>>> >> "heap").
>>>> >> >>>>
>>>> >> >>>> My set-up is:
>>>> >> >>>>
>>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>> >> >>>>
>>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>> >> >>>>
>>>> >> >>>> [httpd]
>>>> >> >>>> bind_address = XXX.XX.X.XX
>>>> >> >>>>
>>>> >> >>>> [cluster]
>>>> >> >>>> ; number of shards for a new database
>>>> >> >>>> q = 9
>>>> >> >>>> ; number of copies of each shard
>>>> >> >>>> n = 1
>>>> >> >>>>
>>>> >> >>>> [couchdb]
>>>> >> >>>> database_dir = /other/bigcouch/database
>>>> >> >>>> view_index_dir = /other/bigcouch/view
>>>> >> >>>>
>>>> >> >>>> The error is always generate on the third node in the cluster and
>>>> the
>>>> >> server basically max's out on memory before hand. The other nodes seem
>>>> to
>>>> >> be doing very little, but are getting data i.e. the shard sizes are
>>>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>>>> >> interested in resilience.
>>>> >> >>>>
>>>> >> >>>> Any help would be greatly appreciated.
>>>> >> >>>>
>>>> >> >>>> Mike
>>>> >> >>>>
>>>> >>
>>>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

Mike,

Thanks for the logs, they do look clean, as you said.

It was remiss of me not to ask for version numbers. Can you tell me
which bigcouch version, erlang version, spidermonkey version you have
here?

B.

On 13 April 2012 21:18, Mike Kimber <mk...@kana.com> wrote:
> A clean log file (i.e. stop bigcouch, delete log file, restart bigcouch, run replication, wait for failure, stop bigcouch) from the node that failed this time around can be found at:
>
> http://pastebin.com/embed_js.php?i=s52rYwwy
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 13 April 2012 19:28
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> Mike,
>
> Do you have couch.logs from around that time?
>
> B.
>
> On 13 April 2012 17:54, Mike Kimber <mk...@kana.com> wrote:
>> Sorry forgot to say that I have already up'd it to N=3 and still get the same issue.
>>
>> I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:
>>
>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>>  3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
>>  2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
>>  2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
>>  2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
>>  2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
>>  1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
>>  1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
>>  2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
>>  3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
>>  2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
>>  1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
>>  1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
>>  2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
>>  2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
>>  2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
>>  2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
>>  2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
>> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
>>  0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
>>  0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
>>  0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
>>  0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
>>  0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
>>  0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0
>>
>> I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)
>>
>> Mike
>>
>> -----Original Message-----
>> From: Robert Newson [mailto:rnewson@apache.org]
>> Sent: 13 April 2012 17:31
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> I should note that bigcouch is tested much more often with N=3.
>> Perhaps there's something about N=1 that exasperates the issue. For a
>> test, could you try with N=3?
>>
>> B.
>>
>> On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
>>> "1. Try to replicate the database in another CouchDB."
>>>
>>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>>>
>>> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>>>
>>> Mike
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: CGS [mailto:cgsmcmlxxv@gmail.com]
>>> Sent: 13 April 2012 15:01
>>> To: user@couchdb.apache.org
>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>
>>> If you say so, Robert, I won't argue with you on that. I meant no offense,
>>> so, please, accept my apologies if I crossed the line. It's all your's from
>>> now on.
>>>
>>> Mike, please, ignore my suggestion. Sorry for interfering.
>>>
>>> Good luck!
>>>
>>> CGS
>>>
>>>
>>>
>>>
>>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>>>
>>>> I think you should point out that "My idea behind these tests is that
>>>> it may be that your database may be
>>>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>> you get is just garbage at a certain document. " is based on no
>>>> evidence. Nor, if it were true, would it necessarily explain the
>>>> observed behavior either.
>>>>
>>>> It would be useful if we could all stick to asserting only things we
>>>> know to be true or have reasonable grounds to believe are true.
>>>> Unfounded speculation, though offered sincerely, is not helpful on a
>>>> mailing list intended to provide assistance.
>>>>
>>>> Thanks,
>>>> B.
>>>>
>>>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>>>> > Hi Mike,
>>>> >
>>>> > I haven't used BigCouch by now and that's why I haven't said anything by
>>>> > now. Still, giving a thought of what may occur there, I propose few tests
>>>> > if you have time:
>>>> > 1. Try to replicate the database in another CouchDB.
>>>> > 2. If 1 passes, try to replicate to only one node at the time.
>>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>>>> > replication (for sure it will fail at all 3 nodes at the time).
>>>> >
>>>> > My idea behind these tests is that it may be that your database may be
>>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>>>> > you get is just garbage at a certain document. That's why I proposed the
>>>> > first test. The second test is to see if any of the nodes has a problem
>>>> in
>>>> > configuration (or if there is any incompatibility in between your CouchDB
>>>> > and BigCouch in manipulating your docs). Finally, the third test is to
>>>> see
>>>> > if server/node resources limit the number of replications (and at how
>>>> many
>>>> > it starts to fail).
>>>> >
>>>> > Can you also check the size of the shards at tests 2 and 3?
>>>> >
>>>> > If you consider that these tests are irrelevant, please, ignore my
>>>> > suggestion.
>>>> >
>>>> > CGS
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>>>> >
>>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>>>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>>>> me
>>>> >> any additional time.
>>>> >>
>>>> >> Mike
>>>> >>
>>>> >> -----Original Message-----
>>>> >> From: Robert Newson [mailto:rnewson@apache.org]
>>>> >> Sent: 12 April 2012 19:34
>>>> >> To: user@couchdb.apache.org
>>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>>> >>
>>>> >> 2GB total ram does sound tight. I can only compare to high volume
>>>> >> production clusters which have much more ram than this. Given that
>>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>>>> >> maximum size of that pool in config, I think the default is 50.
>>>> >>
>>>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>>>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>>>> >> >
>>>> >> > Centos 5.8 (Virtualised)
>>>> >> > 2 Cores
>>>> >> > 2GB RAM
>>>> >> >
>>>> >> > I'm trying to replicate about 75K documents which total 6GB when
>>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>>>> they
>>>> >> are fairly large documents.
>>>> >> >
>>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>>>> >> >
>>>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>>>> >> -----cpu------
>>>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>>>> sy
>>>> >> id wa st
>>>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>>>  6
>>>> >>  2 91  0
>>>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>>>  5
>>>> >>  9 85  0
>>>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>>>  7
>>>> >>  1 91  0
>>>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>>>> 10
>>>> >>  4 85  0
>>>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>>>  7
>>>> >> 33 47  0
>>>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>>>  8
>>>> >> 49 26  0
>>>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>>>  9
>>>> >> 61  4  0
>>>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>>>  4
>>>> >> 49 40  0
>>>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>>>  2
>>>> >> 50 44  0
>>>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>>>  2
>>>> >> 50 40  0
>>>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>>>> 20
>>>> >> 36 23  0
>>>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>>>> 22
>>>> >>  0 75  0
>>>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>>>> >> 19 17 59  0
>>>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>>>> 10
>>>> >> 29 58  0
>>>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>>>  9
>>>> >> 32 57  0
>>>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>>>  7
>>>> >> 30 61  0
>>>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>>>  7
>>>> >>  6 84  0
>>>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>>>  6
>>>> >> 11 83  0
>>>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>>>  8
>>>> >> 16 75  0
>>>> >> >
>>>> >> > It only ever takes out one node at a time and the other nodes seem to
>>>> be
>>>> >> doing very little while the one node is running out of memory.
>>>> >> >
>>>> >> > If I kick it off again it processed some more and then spikes the
>>>> memory
>>>> >> and fails
>>>> >> >
>>>> >> > Thanks
>>>> >> >
>>>> >> > Mike
>>>> >> >
>>>> >> > PS: hope you enjoyed you couchdb get together!
>>>> >> >
>>>> >> > -----Original Message-----
>>>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>>>> >> > Sent: 12 April 2012 17:28
>>>> >> > To: user@couchdb.apache.org
>>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>>>> memory
>>>> >> >
>>>> >> > What kind of load were you putting the machine on?
>>>> >> >
>>>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>>> >> >> Could you show your vm.args file?
>>>> >> >>
>>>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>>> >> >>> Unfortunately your request for help coincided with the two day
>>>> CouchDB
>>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>>> >> >>> ways to get bigcouch support, but we happily answer queries here
>>>> too,
>>>> >> >>> when not at the Model UN of CouchDB. :D
>>>> >> >>>
>>>> >> >>> B.
>>>> >> >>>
>>>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>> >> >>>> Looks like this isn't the right place based on the responses so
>>>> far.
>>>> >> Shame I hoped this was going to help solve our index/view rebuild times
>>>> etc.
>>>> >> >>>>
>>>> >> >>>> Mike
>>>> >> >>>>
>>>> >> >>>> -----Original Message-----
>>>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>> >> >>>> Sent: 10 April 2012 09:20
>>>> >> >>>> To: user@couchdb.apache.org
>>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>> >> >>>>
>>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>>>> >> cluster? If this is not the correct place please point me in the right
>>>> >> direction if it is then any one have any ideas why I keep getting the
>>>> >> following error message when I kick of a replication;
>>>> >> >>>>
>>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>>>> >> "heap").
>>>> >> >>>>
>>>> >> >>>> My set-up is:
>>>> >> >>>>
>>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>> >> >>>>
>>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>> >> >>>>
>>>> >> >>>> [httpd]
>>>> >> >>>> bind_address = XXX.XX.X.XX
>>>> >> >>>>
>>>> >> >>>> [cluster]
>>>> >> >>>> ; number of shards for a new database
>>>> >> >>>> q = 9
>>>> >> >>>> ; number of copies of each shard
>>>> >> >>>> n = 1
>>>> >> >>>>
>>>> >> >>>> [couchdb]
>>>> >> >>>> database_dir = /other/bigcouch/database
>>>> >> >>>> view_index_dir = /other/bigcouch/view
>>>> >> >>>>
>>>> >> >>>> The error is always generate on the third node in the cluster and
>>>> the
>>>> >> server basically max's out on memory before hand. The other nodes seem
>>>> to
>>>> >> be doing very little, but are getting data i.e. the shard sizes are
>>>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>>>> >> interested in resilience.
>>>> >> >>>>
>>>> >> >>>> Any help would be greatly appreciated.
>>>> >> >>>>
>>>> >> >>>> Mike
>>>> >> >>>>
>>>> >>
>>>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

A clean log file (i.e. stop bigcouch, delete log file, restart bigcouch, run replication, wait for failure, stop bigcouch) from the node that failed this time around can be found at:

http://pastebin.com/embed_js.php?i=s52rYwwy

Mike 

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org] 
Sent: 13 April 2012 19:28
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

Mike,

Do you have couch.logs from around that time?

B.

On 13 April 2012 17:54, Mike Kimber <mk...@kana.com> wrote:
> Sorry forgot to say that I have already up'd it to N=3 and still get the same issue.
>
> I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:
>
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
>  2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
>  2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
>  2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
>  2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
>  1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
>  1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
>  2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
>  3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
>  2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
>  1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
>  1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
>  2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
>  2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
>  2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
>  2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
>  2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
>  0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
>  0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
>  0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
>  0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
>  0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
>  0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0
>
> I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 13 April 2012 17:31
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> I should note that bigcouch is tested much more often with N=3.
> Perhaps there's something about N=1 that exasperates the issue. For a
> test, could you try with N=3?
>
> B.
>
> On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
>> "1. Try to replicate the database in another CouchDB."
>>
>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>>
>> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>>
>> Mike
>>
>>
>>
>> -----Original Message-----
>> From: CGS [mailto:cgsmcmlxxv@gmail.com]
>> Sent: 13 April 2012 15:01
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> If you say so, Robert, I won't argue with you on that. I meant no offense,
>> so, please, accept my apologies if I crossed the line. It's all your's from
>> now on.
>>
>> Mike, please, ignore my suggestion. Sorry for interfering.
>>
>> Good luck!
>>
>> CGS
>>
>>
>>
>>
>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>>
>>> I think you should point out that "My idea behind these tests is that
>>> it may be that your database may be
>>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>>> you get is just garbage at a certain document. " is based on no
>>> evidence. Nor, if it were true, would it necessarily explain the
>>> observed behavior either.
>>>
>>> It would be useful if we could all stick to asserting only things we
>>> know to be true or have reasonable grounds to believe are true.
>>> Unfounded speculation, though offered sincerely, is not helpful on a
>>> mailing list intended to provide assistance.
>>>
>>> Thanks,
>>> B.
>>>
>>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>>> > Hi Mike,
>>> >
>>> > I haven't used BigCouch by now and that's why I haven't said anything by
>>> > now. Still, giving a thought of what may occur there, I propose few tests
>>> > if you have time:
>>> > 1. Try to replicate the database in another CouchDB.
>>> > 2. If 1 passes, try to replicate to only one node at the time.
>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>>> > replication (for sure it will fail at all 3 nodes at the time).
>>> >
>>> > My idea behind these tests is that it may be that your database may be
>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>>> > you get is just garbage at a certain document. That's why I proposed the
>>> > first test. The second test is to see if any of the nodes has a problem
>>> in
>>> > configuration (or if there is any incompatibility in between your CouchDB
>>> > and BigCouch in manipulating your docs). Finally, the third test is to
>>> see
>>> > if server/node resources limit the number of replications (and at how
>>> many
>>> > it starts to fail).
>>> >
>>> > Can you also check the size of the shards at tests 2 and 3?
>>> >
>>> > If you consider that these tests are irrelevant, please, ignore my
>>> > suggestion.
>>> >
>>> > CGS
>>> >
>>> >
>>> >
>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>>> >
>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>>> me
>>> >> any additional time.
>>> >>
>>> >> Mike
>>> >>
>>> >> -----Original Message-----
>>> >> From: Robert Newson [mailto:rnewson@apache.org]
>>> >> Sent: 12 April 2012 19:34
>>> >> To: user@couchdb.apache.org
>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>> >>
>>> >> 2GB total ram does sound tight. I can only compare to high volume
>>> >> production clusters which have much more ram than this. Given that
>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>>> >> maximum size of that pool in config, I think the default is 50.
>>> >>
>>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>>> >> >
>>> >> > Centos 5.8 (Virtualised)
>>> >> > 2 Cores
>>> >> > 2GB RAM
>>> >> >
>>> >> > I'm trying to replicate about 75K documents which total 6GB when
>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>>> they
>>> >> are fairly large documents.
>>> >> >
>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>>> >> >
>>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>>> >> -----cpu------
>>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>>> sy
>>> >> id wa st
>>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>>  6
>>> >>  2 91  0
>>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>>  5
>>> >>  9 85  0
>>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>>  7
>>> >>  1 91  0
>>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>>> 10
>>> >>  4 85  0
>>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>>  7
>>> >> 33 47  0
>>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>>  8
>>> >> 49 26  0
>>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>>  9
>>> >> 61  4  0
>>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>>  4
>>> >> 49 40  0
>>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>>  2
>>> >> 50 44  0
>>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>>  2
>>> >> 50 40  0
>>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>>> 20
>>> >> 36 23  0
>>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>>> 22
>>> >>  0 75  0
>>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>>> >> 19 17 59  0
>>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>>> 10
>>> >> 29 58  0
>>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>>  9
>>> >> 32 57  0
>>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>>  7
>>> >> 30 61  0
>>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>>  7
>>> >>  6 84  0
>>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>>  6
>>> >> 11 83  0
>>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>>  8
>>> >> 16 75  0
>>> >> >
>>> >> > It only ever takes out one node at a time and the other nodes seem to
>>> be
>>> >> doing very little while the one node is running out of memory.
>>> >> >
>>> >> > If I kick it off again it processed some more and then spikes the
>>> memory
>>> >> and fails
>>> >> >
>>> >> > Thanks
>>> >> >
>>> >> > Mike
>>> >> >
>>> >> > PS: hope you enjoyed you couchdb get together!
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>>> >> > Sent: 12 April 2012 17:28
>>> >> > To: user@couchdb.apache.org
>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>>> memory
>>> >> >
>>> >> > What kind of load were you putting the machine on?
>>> >> >
>>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>> >> >> Could you show your vm.args file?
>>> >> >>
>>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>> >> >>> Unfortunately your request for help coincided with the two day
>>> CouchDB
>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>> >> >>> ways to get bigcouch support, but we happily answer queries here
>>> too,
>>> >> >>> when not at the Model UN of CouchDB. :D
>>> >> >>>
>>> >> >>> B.
>>> >> >>>
>>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>> >> >>>> Looks like this isn't the right place based on the responses so
>>> far.
>>> >> Shame I hoped this was going to help solve our index/view rebuild times
>>> etc.
>>> >> >>>>
>>> >> >>>> Mike
>>> >> >>>>
>>> >> >>>> -----Original Message-----
>>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>> >> >>>> Sent: 10 April 2012 09:20
>>> >> >>>> To: user@couchdb.apache.org
>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>> >> >>>>
>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>>> >> cluster? If this is not the correct place please point me in the right
>>> >> direction if it is then any one have any ideas why I keep getting the
>>> >> following error message when I kick of a replication;
>>> >> >>>>
>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>>> >> "heap").
>>> >> >>>>
>>> >> >>>> My set-up is:
>>> >> >>>>
>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>> >> >>>>
>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>>> >> >>>>
>>> >> >>>> [httpd]
>>> >> >>>> bind_address = XXX.XX.X.XX
>>> >> >>>>
>>> >> >>>> [cluster]
>>> >> >>>> ; number of shards for a new database
>>> >> >>>> q = 9
>>> >> >>>> ; number of copies of each shard
>>> >> >>>> n = 1
>>> >> >>>>
>>> >> >>>> [couchdb]
>>> >> >>>> database_dir = /other/bigcouch/database
>>> >> >>>> view_index_dir = /other/bigcouch/view
>>> >> >>>>
>>> >> >>>> The error is always generate on the third node in the cluster and
>>> the
>>> >> server basically max's out on memory before hand. The other nodes seem
>>> to
>>> >> be doing very little, but are getting data i.e. the shard sizes are
>>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>>> >> interested in resilience.
>>> >> >>>>
>>> >> >>>> Any help would be greatly appreciated.
>>> >> >>>>
>>> >> >>>> Mike
>>> >> >>>>
>>> >>
>>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

Mike,

Do you have couch.logs from around that time?

B.

On 13 April 2012 17:54, Mike Kimber <mk...@kana.com> wrote:
> Sorry forgot to say that I have already up'd it to N=3 and still get the same issue.
>
> I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:
>
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
>  2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
>  2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
>  2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
>  2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
>  1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
>  1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
>  2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
>  3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
>  2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
>  1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
>  1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
>  2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
>  2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
>  2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
>  2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
>  2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
>  0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
>  0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
>  0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
>  0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
>  0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
>  0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0
>
> I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 13 April 2012 17:31
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> I should note that bigcouch is tested much more often with N=3.
> Perhaps there's something about N=1 that exasperates the issue. For a
> test, could you try with N=3?
>
> B.
>
> On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
>> "1. Try to replicate the database in another CouchDB."
>>
>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>>
>> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>>
>> Mike
>>
>>
>>
>> -----Original Message-----
>> From: CGS [mailto:cgsmcmlxxv@gmail.com]
>> Sent: 13 April 2012 15:01
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> If you say so, Robert, I won't argue with you on that. I meant no offense,
>> so, please, accept my apologies if I crossed the line. It's all your's from
>> now on.
>>
>> Mike, please, ignore my suggestion. Sorry for interfering.
>>
>> Good luck!
>>
>> CGS
>>
>>
>>
>>
>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>>
>>> I think you should point out that "My idea behind these tests is that
>>> it may be that your database may be
>>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>>> you get is just garbage at a certain document. " is based on no
>>> evidence. Nor, if it were true, would it necessarily explain the
>>> observed behavior either.
>>>
>>> It would be useful if we could all stick to asserting only things we
>>> know to be true or have reasonable grounds to believe are true.
>>> Unfounded speculation, though offered sincerely, is not helpful on a
>>> mailing list intended to provide assistance.
>>>
>>> Thanks,
>>> B.
>>>
>>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>>> > Hi Mike,
>>> >
>>> > I haven't used BigCouch by now and that's why I haven't said anything by
>>> > now. Still, giving a thought of what may occur there, I propose few tests
>>> > if you have time:
>>> > 1. Try to replicate the database in another CouchDB.
>>> > 2. If 1 passes, try to replicate to only one node at the time.
>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>>> > replication (for sure it will fail at all 3 nodes at the time).
>>> >
>>> > My idea behind these tests is that it may be that your database may be
>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>>> > you get is just garbage at a certain document. That's why I proposed the
>>> > first test. The second test is to see if any of the nodes has a problem
>>> in
>>> > configuration (or if there is any incompatibility in between your CouchDB
>>> > and BigCouch in manipulating your docs). Finally, the third test is to
>>> see
>>> > if server/node resources limit the number of replications (and at how
>>> many
>>> > it starts to fail).
>>> >
>>> > Can you also check the size of the shards at tests 2 and 3?
>>> >
>>> > If you consider that these tests are irrelevant, please, ignore my
>>> > suggestion.
>>> >
>>> > CGS
>>> >
>>> >
>>> >
>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>>> >
>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>>> me
>>> >> any additional time.
>>> >>
>>> >> Mike
>>> >>
>>> >> -----Original Message-----
>>> >> From: Robert Newson [mailto:rnewson@apache.org]
>>> >> Sent: 12 April 2012 19:34
>>> >> To: user@couchdb.apache.org
>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>> >>
>>> >> 2GB total ram does sound tight. I can only compare to high volume
>>> >> production clusters which have much more ram than this. Given that
>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>>> >> maximum size of that pool in config, I think the default is 50.
>>> >>
>>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>>> >> >
>>> >> > Centos 5.8 (Virtualised)
>>> >> > 2 Cores
>>> >> > 2GB RAM
>>> >> >
>>> >> > I'm trying to replicate about 75K documents which total 6GB when
>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>>> they
>>> >> are fairly large documents.
>>> >> >
>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>>> >> >
>>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>>> >> -----cpu------
>>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>>> sy
>>> >> id wa st
>>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>>  6
>>> >>  2 91  0
>>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>>  5
>>> >>  9 85  0
>>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>>  7
>>> >>  1 91  0
>>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>>> 10
>>> >>  4 85  0
>>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>>  7
>>> >> 33 47  0
>>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>>  8
>>> >> 49 26  0
>>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>>  9
>>> >> 61  4  0
>>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>>  4
>>> >> 49 40  0
>>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>>  2
>>> >> 50 44  0
>>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>>  2
>>> >> 50 40  0
>>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>>> 20
>>> >> 36 23  0
>>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>>> 22
>>> >>  0 75  0
>>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>>> >> 19 17 59  0
>>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>>> 10
>>> >> 29 58  0
>>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>>  9
>>> >> 32 57  0
>>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>>  7
>>> >> 30 61  0
>>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>>  7
>>> >>  6 84  0
>>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>>  6
>>> >> 11 83  0
>>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>>  8
>>> >> 16 75  0
>>> >> >
>>> >> > It only ever takes out one node at a time and the other nodes seem to
>>> be
>>> >> doing very little while the one node is running out of memory.
>>> >> >
>>> >> > If I kick it off again it processed some more and then spikes the
>>> memory
>>> >> and fails
>>> >> >
>>> >> > Thanks
>>> >> >
>>> >> > Mike
>>> >> >
>>> >> > PS: hope you enjoyed you couchdb get together!
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>>> >> > Sent: 12 April 2012 17:28
>>> >> > To: user@couchdb.apache.org
>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>>> memory
>>> >> >
>>> >> > What kind of load were you putting the machine on?
>>> >> >
>>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>>> >> >> Could you show your vm.args file?
>>> >> >>
>>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>> >> >>> Unfortunately your request for help coincided with the two day
>>> CouchDB
>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>> >> >>> ways to get bigcouch support, but we happily answer queries here
>>> too,
>>> >> >>> when not at the Model UN of CouchDB. :D
>>> >> >>>
>>> >> >>> B.
>>> >> >>>
>>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>> >> >>>> Looks like this isn't the right place based on the responses so
>>> far.
>>> >> Shame I hoped this was going to help solve our index/view rebuild times
>>> etc.
>>> >> >>>>
>>> >> >>>> Mike
>>> >> >>>>
>>> >> >>>> -----Original Message-----
>>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>> >> >>>> Sent: 10 April 2012 09:20
>>> >> >>>> To: user@couchdb.apache.org
>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>> >> >>>>
>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>>> >> cluster? If this is not the correct place please point me in the right
>>> >> direction if it is then any one have any ideas why I keep getting the
>>> >> following error message when I kick of a replication;
>>> >> >>>>
>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>>> >> "heap").
>>> >> >>>>
>>> >> >>>> My set-up is:
>>> >> >>>>
>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>> >> >>>>
>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>>> >> >>>>
>>> >> >>>> [httpd]
>>> >> >>>> bind_address = XXX.XX.X.XX
>>> >> >>>>
>>> >> >>>> [cluster]
>>> >> >>>> ; number of shards for a new database
>>> >> >>>> q = 9
>>> >> >>>> ; number of copies of each shard
>>> >> >>>> n = 1
>>> >> >>>>
>>> >> >>>> [couchdb]
>>> >> >>>> database_dir = /other/bigcouch/database
>>> >> >>>> view_index_dir = /other/bigcouch/view
>>> >> >>>>
>>> >> >>>> The error is always generate on the third node in the cluster and
>>> the
>>> >> server basically max's out on memory before hand. The other nodes seem
>>> to
>>> >> be doing very little, but are getting data i.e. the shard sizes are
>>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>>> >> interested in resilience.
>>> >> >>>>
>>> >> >>>> Any help would be greatly appreciated.
>>> >> >>>>
>>> >> >>>> Mike
>>> >> >>>>
>>> >>
>>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

Sorry forgot to say that I have already up'd it to N=3 and still get the same issue. 

I ran it again with the 6GB of RAM on each of the servers and ran vmstat and got the following:

r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0 2067468  31816 302204    0    0     0     5 1820  360 63 32  5  0  0
 2  0      0 2457728  31816 302212    0    0     0     2 2188  322 70 25  4  0  0
 2  0      0 1936092  31816 302212    0    0     0     0 3020  200 73 24  3  0  0
 2  0      0 687428  31816 302212    0    0     0     1 1958  368 56 42  2  0  0
 2  0      0 2128192  31824 302212    0    0     0     2 2779  243 64 29  7  0  0
 1  0      0 1829848  31824 302216    0    0     0     0 1734  280 68 29  3  0  0
 1  0      0 1200300  31832 302216    0    0     0     8 1841  231 43 13 44  0  0
 2  0      0 1638752  31840 302208    0    0     0     5 2625  350 71 20  8  0  0
 3  0      0 1670856  31848 302216    0    0     0     3 2150  492 40 21 39  0  0
 2  0      0 1020848  31848 302216    0    0     0     0 2307  644 67 22 11  0  0
 1  0      0 271640  31848 302216    0    0     0     6 1995  280 54 42  4  0  0
 1  0      0 455408  31848 302216    0    0     0     1 1879  238 64 33  3  0  0
 2  0      0 1240616  25584 193044    0    0     0     2 2408  232 59 34  8  0  0
 2  0      0 611280  25592 193036    0    0     0     3 2286  246 72 25  2  0  0
 2  0      0 679548  25592 193044    0    0     0     2 3038  175 78 21  2  0  0
 2  0      0 786360  25600 193044    0    0     0     3 1679  269 74 23  3  0  0
 2  0      0 568632  25600 193044    0    0     0     0 2796  274 74 24  2  0  0
eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap").
 0  0      0 5749480  25600 193044    0    0     0     0 1389  160 33 15 52  0  0
 0  0      0 5749956  25608 193044    0    0     0    10 1007   82  0  0 100  0  0
 0  0      0 5749988  25616 193036    0    0     0     3 1016   85  0  0 100  0  0
 0  0      0 5750020  25616 193044    0    0     0     0  998   79  0  0 100  0  0
 0  0      0 5750168  25620 193040    0    0     0     1 1007   87  0  0 100  0  0
 0  0      0 5750308  25620 193044    0    0     0     0 1008   82  0  0 100  0  0

I really need to work out what each process is doing with respect to memory at the time of failure. I had top running, but not on the node that failed this time, sods law :-)

Mike 

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org] 
Sent: 13 April 2012 17:31
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

I should note that bigcouch is tested much more often with N=3.
Perhaps there's something about N=1 that exasperates the issue. For a
test, could you try with N=3?

B.

On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
> "1. Try to replicate the database in another CouchDB."
>
> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>
> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>
> Mike
>
>
>
> -----Original Message-----
> From: CGS [mailto:cgsmcmlxxv@gmail.com]
> Sent: 13 April 2012 15:01
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> If you say so, Robert, I won't argue with you on that. I meant no offense,
> so, please, accept my apologies if I crossed the line. It's all your's from
> now on.
>
> Mike, please, ignore my suggestion. Sorry for interfering.
>
> Good luck!
>
> CGS
>
>
>
>
> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>
>> I think you should point out that "My idea behind these tests is that
>> it may be that your database may be
>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>> you get is just garbage at a certain document. " is based on no
>> evidence. Nor, if it were true, would it necessarily explain the
>> observed behavior either.
>>
>> It would be useful if we could all stick to asserting only things we
>> know to be true or have reasonable grounds to believe are true.
>> Unfounded speculation, though offered sincerely, is not helpful on a
>> mailing list intended to provide assistance.
>>
>> Thanks,
>> B.
>>
>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>> > Hi Mike,
>> >
>> > I haven't used BigCouch by now and that's why I haven't said anything by
>> > now. Still, giving a thought of what may occur there, I propose few tests
>> > if you have time:
>> > 1. Try to replicate the database in another CouchDB.
>> > 2. If 1 passes, try to replicate to only one node at the time.
>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>> > replication (for sure it will fail at all 3 nodes at the time).
>> >
>> > My idea behind these tests is that it may be that your database may be
>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>> > you get is just garbage at a certain document. That's why I proposed the
>> > first test. The second test is to see if any of the nodes has a problem
>> in
>> > configuration (or if there is any incompatibility in between your CouchDB
>> > and BigCouch in manipulating your docs). Finally, the third test is to
>> see
>> > if server/node resources limit the number of replications (and at how
>> many
>> > it starts to fail).
>> >
>> > Can you also check the size of the shards at tests 2 and 3?
>> >
>> > If you consider that these tests are irrelevant, please, ignore my
>> > suggestion.
>> >
>> > CGS
>> >
>> >
>> >
>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>> >
>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>> me
>> >> any additional time.
>> >>
>> >> Mike
>> >>
>> >> -----Original Message-----
>> >> From: Robert Newson [mailto:rnewson@apache.org]
>> >> Sent: 12 April 2012 19:34
>> >> To: user@couchdb.apache.org
>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>> >>
>> >> 2GB total ram does sound tight. I can only compare to high volume
>> >> production clusters which have much more ram than this. Given that
>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>> >> maximum size of that pool in config, I think the default is 50.
>> >>
>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>> >> >
>> >> > Centos 5.8 (Virtualised)
>> >> > 2 Cores
>> >> > 2GB RAM
>> >> >
>> >> > I'm trying to replicate about 75K documents which total 6GB when
>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>> they
>> >> are fairly large documents.
>> >> >
>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>> >> >
>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>> >> -----cpu------
>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>> sy
>> >> id wa st
>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>  6
>> >>  2 91  0
>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>  5
>> >>  9 85  0
>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>  7
>> >>  1 91  0
>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>> 10
>> >>  4 85  0
>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>  7
>> >> 33 47  0
>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>  8
>> >> 49 26  0
>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>  9
>> >> 61  4  0
>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>  4
>> >> 49 40  0
>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>  2
>> >> 50 44  0
>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>  2
>> >> 50 40  0
>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>> 20
>> >> 36 23  0
>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>> 22
>> >>  0 75  0
>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>> >> 19 17 59  0
>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>> 10
>> >> 29 58  0
>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>  9
>> >> 32 57  0
>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>  7
>> >> 30 61  0
>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>  7
>> >>  6 84  0
>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>  6
>> >> 11 83  0
>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>  8
>> >> 16 75  0
>> >> >
>> >> > It only ever takes out one node at a time and the other nodes seem to
>> be
>> >> doing very little while the one node is running out of memory.
>> >> >
>> >> > If I kick it off again it processed some more and then spikes the
>> memory
>> >> and fails
>> >> >
>> >> > Thanks
>> >> >
>> >> > Mike
>> >> >
>> >> > PS: hope you enjoyed you couchdb get together!
>> >> >
>> >> > -----Original Message-----
>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>> >> > Sent: 12 April 2012 17:28
>> >> > To: user@couchdb.apache.org
>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>> memory
>> >> >
>> >> > What kind of load were you putting the machine on?
>> >> >
>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>> >> >> Could you show your vm.args file?
>> >> >>
>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>> >> >>> Unfortunately your request for help coincided with the two day
>> CouchDB
>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>> >> >>> ways to get bigcouch support, but we happily answer queries here
>> too,
>> >> >>> when not at the Model UN of CouchDB. :D
>> >> >>>
>> >> >>> B.
>> >> >>>
>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>> >> >>>> Looks like this isn't the right place based on the responses so
>> far.
>> >> Shame I hoped this was going to help solve our index/view rebuild times
>> etc.
>> >> >>>>
>> >> >>>> Mike
>> >> >>>>
>> >> >>>> -----Original Message-----
>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>> >> >>>> Sent: 10 April 2012 09:20
>> >> >>>> To: user@couchdb.apache.org
>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>> >> >>>>
>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>> >> cluster? If this is not the correct place please point me in the right
>> >> direction if it is then any one have any ideas why I keep getting the
>> >> following error message when I kick of a replication;
>> >> >>>>
>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>> >> "heap").
>> >> >>>>
>> >> >>>> My set-up is:
>> >> >>>>
>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>> >> >>>>
>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>> >> >>>>
>> >> >>>> [httpd]
>> >> >>>> bind_address = XXX.XX.X.XX
>> >> >>>>
>> >> >>>> [cluster]
>> >> >>>> ; number of shards for a new database
>> >> >>>> q = 9
>> >> >>>> ; number of copies of each shard
>> >> >>>> n = 1
>> >> >>>>
>> >> >>>> [couchdb]
>> >> >>>> database_dir = /other/bigcouch/database
>> >> >>>> view_index_dir = /other/bigcouch/view
>> >> >>>>
>> >> >>>> The error is always generate on the third node in the cluster and
>> the
>> >> server basically max's out on memory before hand. The other nodes seem
>> to
>> >> be doing very little, but are getting data i.e. the shard sizes are
>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>> >> interested in resilience.
>> >> >>>>
>> >> >>>> Any help would be greatly appreciated.
>> >> >>>>
>> >> >>>> Mike
>> >> >>>>
>> >>
>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

I should note that bigcouch is tested much more often with N=3.
Perhaps there's something about N=1 that exasperates the issue. For a
test, could you try with N=3?

B.

On 13 April 2012 16:24, Mike Kimber <mk...@kana.com> wrote:
> "1. Try to replicate the database in another CouchDB."
>
> I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.
>
> I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.
>
> Mike
>
>
>
> -----Original Message-----
> From: CGS [mailto:cgsmcmlxxv@gmail.com]
> Sent: 13 April 2012 15:01
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> If you say so, Robert, I won't argue with you on that. I meant no offense,
> so, please, accept my apologies if I crossed the line. It's all your's from
> now on.
>
> Mike, please, ignore my suggestion. Sorry for interfering.
>
> Good luck!
>
> CGS
>
>
>
>
> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:
>
>> I think you should point out that "My idea behind these tests is that
>> it may be that your database may be
>> corrupted (or seen as corrupted by BigCouch at the second test) and what
>> you get is just garbage at a certain document. " is based on no
>> evidence. Nor, if it were true, would it necessarily explain the
>> observed behavior either.
>>
>> It would be useful if we could all stick to asserting only things we
>> know to be true or have reasonable grounds to believe are true.
>> Unfounded speculation, though offered sincerely, is not helpful on a
>> mailing list intended to provide assistance.
>>
>> Thanks,
>> B.
>>
>> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
>> > Hi Mike,
>> >
>> > I haven't used BigCouch by now and that's why I haven't said anything by
>> > now. Still, giving a thought of what may occur there, I propose few tests
>> > if you have time:
>> > 1. Try to replicate the database in another CouchDB.
>> > 2. If 1 passes, try to replicate to only one node at the time.
>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
>> > replication (for sure it will fail at all 3 nodes at the time).
>> >
>> > My idea behind these tests is that it may be that your database may be
>> > corrupted (or seen as corrupted by BigCouch at the second test) and what
>> > you get is just garbage at a certain document. That's why I proposed the
>> > first test. The second test is to see if any of the nodes has a problem
>> in
>> > configuration (or if there is any incompatibility in between your CouchDB
>> > and BigCouch in manipulating your docs). Finally, the third test is to
>> see
>> > if server/node resources limit the number of replications (and at how
>> many
>> > it starts to fail).
>> >
>> > Can you also check the size of the shards at tests 2 and 3?
>> >
>> > If you consider that these tests are irrelevant, please, ignore my
>> > suggestion.
>> >
>> > CGS
>> >
>> >
>> >
>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>> >
>> >> I upped the memory to 6GB on each of the nodes and got exactly the same
>> >> issue in the same time frame i.e. the increased RAM did not seem to by
>> me
>> >> any additional time.
>> >>
>> >> Mike
>> >>
>> >> -----Original Message-----
>> >> From: Robert Newson [mailto:rnewson@apache.org]
>> >> Sent: 12 April 2012 19:34
>> >> To: user@couchdb.apache.org
>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>> >>
>> >> 2GB total ram does sound tight. I can only compare to high volume
>> >> production clusters which have much more ram than this. Given that
>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>> >> rest one? To couchjs processes, by chance? If so, you can reduce the
>> >> maximum size of that pool in config, I think the default is 50.
>> >>
>> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>> >> > Ok, I have 3 nodes all load balanced with HAproxy:
>> >> >
>> >> > Centos 5.8 (Virtualised)
>> >> > 2 Cores
>> >> > 2GB RAM
>> >> >
>> >> > I'm trying to replicate about 75K documents which total 6GB when
>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
>> they
>> >> are fairly large documents.
>> >> >
>> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
>> >> >
>> >> > procs -----------memory---------- ---swap-- -----io---- --system--
>> >> -----cpu------
>> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
>> sy
>> >> id wa st
>> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>>  6
>> >>  2 91  0
>> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>>  5
>> >>  9 85  0
>> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>>  7
>> >>  1 91  0
>> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
>> 10
>> >>  4 85  0
>> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>>  7
>> >> 33 47  0
>> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>>  8
>> >> 49 26  0
>> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>>  9
>> >> 61  4  0
>> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>>  4
>> >> 49 40  0
>> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>>  2
>> >> 50 44  0
>> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>>  2
>> >> 50 40  0
>> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
>> 20
>> >> 36 23  0
>> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
>> 22
>> >>  0 75  0
>> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>> >> 19 17 59  0
>> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
>> 10
>> >> 29 58  0
>> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>>  9
>> >> 32 57  0
>> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>>  7
>> >> 30 61  0
>> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>>  7
>> >>  6 84  0
>> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>>  6
>> >> 11 83  0
>> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>>  8
>> >> 16 75  0
>> >> >
>> >> > It only ever takes out one node at a time and the other nodes seem to
>> be
>> >> doing very little while the one node is running out of memory.
>> >> >
>> >> > If I kick it off again it processed some more and then spikes the
>> memory
>> >> and fails
>> >> >
>> >> > Thanks
>> >> >
>> >> > Mike
>> >> >
>> >> > PS: hope you enjoyed you couchdb get together!
>> >> >
>> >> > -----Original Message-----
>> >> > From: Robert Newson [mailto:rnewson@apache.org]
>> >> > Sent: 12 April 2012 17:28
>> >> > To: user@couchdb.apache.org
>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
>> memory
>> >> >
>> >> > What kind of load were you putting the machine on?
>> >> >
>> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>> >> >> Could you show your vm.args file?
>> >> >>
>> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>> >> >>> Unfortunately your request for help coincided with the two day
>> CouchDB
>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>> >> >>> ways to get bigcouch support, but we happily answer queries here
>> too,
>> >> >>> when not at the Model UN of CouchDB. :D
>> >> >>>
>> >> >>> B.
>> >> >>>
>> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>> >> >>>> Looks like this isn't the right place based on the responses so
>> far.
>> >> Shame I hoped this was going to help solve our index/view rebuild times
>> etc.
>> >> >>>>
>> >> >>>> Mike
>> >> >>>>
>> >> >>>> -----Original Message-----
>> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>> >> >>>> Sent: 10 April 2012 09:20
>> >> >>>> To: user@couchdb.apache.org
>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>> >> >>>>
>> >> >>>> I'm not sure if this is the correct place to raise an issue I am
>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>> >> cluster? If this is not the correct place please point me in the right
>> >> direction if it is then any one have any ideas why I keep getting the
>> >> following error message when I kick of a replication;
>> >> >>>>
>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>> >> "heap").
>> >> >>>>
>> >> >>>> My set-up is:
>> >> >>>>
>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>> >> >>>>
>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
>> >> >>>>
>> >> >>>> [httpd]
>> >> >>>> bind_address = XXX.XX.X.XX
>> >> >>>>
>> >> >>>> [cluster]
>> >> >>>> ; number of shards for a new database
>> >> >>>> q = 9
>> >> >>>> ; number of copies of each shard
>> >> >>>> n = 1
>> >> >>>>
>> >> >>>> [couchdb]
>> >> >>>> database_dir = /other/bigcouch/database
>> >> >>>> view_index_dir = /other/bigcouch/view
>> >> >>>>
>> >> >>>> The error is always generate on the third node in the cluster and
>> the
>> >> server basically max's out on memory before hand. The other nodes seem
>> to
>> >> be doing very little, but are getting data i.e. the shard sizes are
>> >> growing. I've put the copies per shard down to 1 as currently I'm not
>> >> interested in resilience.
>> >> >>>>
>> >> >>>> Any help would be greatly appreciated.
>> >> >>>>
>> >> >>>> Mike
>> >> >>>>
>> >>
>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

"1. Try to replicate the database in another CouchDB."

I have done this to a couchdb 1.2 database successfully. FYI The Source DB is a couchdb 1.1.1.

I haven't done the other tests, but have tested replicating from the couchdb 1.2 database to the bigcouch install and got the same issue.

Mike 



-----Original Message-----
From: CGS [mailto:cgsmcmlxxv@gmail.com] 
Sent: 13 April 2012 15:01
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

If you say so, Robert, I won't argue with you on that. I meant no offense,
so, please, accept my apologies if I crossed the line. It's all your's from
now on.

Mike, please, ignore my suggestion. Sorry for interfering.

Good luck!

CGS




On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:

> I think you should point out that "My idea behind these tests is that
> it may be that your database may be
> corrupted (or seen as corrupted by BigCouch at the second test) and what
> you get is just garbage at a certain document. " is based on no
> evidence. Nor, if it were true, would it necessarily explain the
> observed behavior either.
>
> It would be useful if we could all stick to asserting only things we
> know to be true or have reasonable grounds to believe are true.
> Unfounded speculation, though offered sincerely, is not helpful on a
> mailing list intended to provide assistance.
>
> Thanks,
> B.
>
> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
> > Hi Mike,
> >
> > I haven't used BigCouch by now and that's why I haven't said anything by
> > now. Still, giving a thought of what may occur there, I propose few tests
> > if you have time:
> > 1. Try to replicate the database in another CouchDB.
> > 2. If 1 passes, try to replicate to only one node at the time.
> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
> > replication (for sure it will fail at all 3 nodes at the time).
> >
> > My idea behind these tests is that it may be that your database may be
> > corrupted (or seen as corrupted by BigCouch at the second test) and what
> > you get is just garbage at a certain document. That's why I proposed the
> > first test. The second test is to see if any of the nodes has a problem
> in
> > configuration (or if there is any incompatibility in between your CouchDB
> > and BigCouch in manipulating your docs). Finally, the third test is to
> see
> > if server/node resources limit the number of replications (and at how
> many
> > it starts to fail).
> >
> > Can you also check the size of the shards at tests 2 and 3?
> >
> > If you consider that these tests are irrelevant, please, ignore my
> > suggestion.
> >
> > CGS
> >
> >
> >
> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
> >
> >> I upped the memory to 6GB on each of the nodes and got exactly the same
> >> issue in the same time frame i.e. the increased RAM did not seem to by
> me
> >> any additional time.
> >>
> >> Mike
> >>
> >> -----Original Message-----
> >> From: Robert Newson [mailto:rnewson@apache.org]
> >> Sent: 12 April 2012 19:34
> >> To: user@couchdb.apache.org
> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
> >>
> >> 2GB total ram does sound tight. I can only compare to high volume
> >> production clusters which have much more ram than this. Given that
> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
> >> rest one? To couchjs processes, by chance? If so, you can reduce the
> >> maximum size of that pool in config, I think the default is 50.
> >>
> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
> >> > Ok, I have 3 nodes all load balanced with HAproxy:
> >> >
> >> > Centos 5.8 (Virtualised)
> >> > 2 Cores
> >> > 2GB RAM
> >> >
> >> > I'm trying to replicate about 75K documents which total 6GB when
> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
> they
> >> are fairly large documents.
> >> >
> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
> >> >
> >> > procs -----------memory---------- ---swap-- -----io---- --system--
> >> -----cpu------
> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy
> >> id wa st
> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>  6
> >>  2 91  0
> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>  5
> >>  9 85  0
> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>  7
> >>  1 91  0
> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
> 10
> >>  4 85  0
> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>  7
> >> 33 47  0
> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>  8
> >> 49 26  0
> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>  9
> >> 61  4  0
> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>  4
> >> 49 40  0
> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>  2
> >> 50 44  0
> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>  2
> >> 50 40  0
> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
> 20
> >> 36 23  0
> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
> 22
> >>  0 75  0
> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
> >> 19 17 59  0
> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
> 10
> >> 29 58  0
> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>  9
> >> 32 57  0
> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>  7
> >> 30 61  0
> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>  7
> >>  6 84  0
> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>  6
> >> 11 83  0
> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>  8
> >> 16 75  0
> >> >
> >> > It only ever takes out one node at a time and the other nodes seem to
> be
> >> doing very little while the one node is running out of memory.
> >> >
> >> > If I kick it off again it processed some more and then spikes the
> memory
> >> and fails
> >> >
> >> > Thanks
> >> >
> >> > Mike
> >> >
> >> > PS: hope you enjoyed you couchdb get together!
> >> >
> >> > -----Original Message-----
> >> > From: Robert Newson [mailto:rnewson@apache.org]
> >> > Sent: 12 April 2012 17:28
> >> > To: user@couchdb.apache.org
> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
> memory
> >> >
> >> > What kind of load were you putting the machine on?
> >> >
> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
> >> >> Could you show your vm.args file?
> >> >>
> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
> >> >>> Unfortunately your request for help coincided with the two day
> CouchDB
> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
> >> >>> ways to get bigcouch support, but we happily answer queries here
> too,
> >> >>> when not at the Model UN of CouchDB. :D
> >> >>>
> >> >>> B.
> >> >>>
> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
> >> >>>> Looks like this isn't the right place based on the responses so
> far.
> >> Shame I hoped this was going to help solve our index/view rebuild times
> etc.
> >> >>>>
> >> >>>> Mike
> >> >>>>
> >> >>>> -----Original Message-----
> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
> >> >>>> Sent: 10 April 2012 09:20
> >> >>>> To: user@couchdb.apache.org
> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
> >> >>>>
> >> >>>> I'm not sure if this is the correct place to raise an issue I am
> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
> >> cluster? If this is not the correct place please point me in the right
> >> direction if it is then any one have any ideas why I keep getting the
> >> following error message when I kick of a replication;
> >> >>>>
> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
> >> "heap").
> >> >>>>
> >> >>>> My set-up is:
> >> >>>>
> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
> >> >>>>
> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
> >> >>>>
> >> >>>> [httpd]
> >> >>>> bind_address = XXX.XX.X.XX
> >> >>>>
> >> >>>> [cluster]
> >> >>>> ; number of shards for a new database
> >> >>>> q = 9
> >> >>>> ; number of copies of each shard
> >> >>>> n = 1
> >> >>>>
> >> >>>> [couchdb]
> >> >>>> database_dir = /other/bigcouch/database
> >> >>>> view_index_dir = /other/bigcouch/view
> >> >>>>
> >> >>>> The error is always generate on the third node in the cluster and
> the
> >> server basically max's out on memory before hand. The other nodes seem
> to
> >> be doing very little, but are getting data i.e. the shard sizes are
> >> growing. I've put the copies per shard down to 1 as currently I'm not
> >> interested in resilience.
> >> >>>>
> >> >>>> Any help would be greatly appreciated.
> >> >>>>
> >> >>>> Mike
> >> >>>>
> >>
>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by CGS <cg...@gmail.com>.

If you say so, Robert, I won't argue with you on that. I meant no offense,
so, please, accept my apologies if I crossed the line. It's all your's from
now on.

Mike, please, ignore my suggestion. Sorry for interfering.

Good luck!

CGS




On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <rn...@apache.org> wrote:

> I think you should point out that "My idea behind these tests is that
> it may be that your database may be
> corrupted (or seen as corrupted by BigCouch at the second test) and what
> you get is just garbage at a certain document. " is based on no
> evidence. Nor, if it were true, would it necessarily explain the
> observed behavior either.
>
> It would be useful if we could all stick to asserting only things we
> know to be true or have reasonable grounds to believe are true.
> Unfounded speculation, though offered sincerely, is not helpful on a
> mailing list intended to provide assistance.
>
> Thanks,
> B.
>
> On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
> > Hi Mike,
> >
> > I haven't used BigCouch by now and that's why I haven't said anything by
> > now. Still, giving a thought of what may occur there, I propose few tests
> > if you have time:
> > 1. Try to replicate the database in another CouchDB.
> > 2. If 1 passes, try to replicate to only one node at the time.
> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the
> > replication (for sure it will fail at all 3 nodes at the time).
> >
> > My idea behind these tests is that it may be that your database may be
> > corrupted (or seen as corrupted by BigCouch at the second test) and what
> > you get is just garbage at a certain document. That's why I proposed the
> > first test. The second test is to see if any of the nodes has a problem
> in
> > configuration (or if there is any incompatibility in between your CouchDB
> > and BigCouch in manipulating your docs). Finally, the third test is to
> see
> > if server/node resources limit the number of replications (and at how
> many
> > it starts to fail).
> >
> > Can you also check the size of the shards at tests 2 and 3?
> >
> > If you consider that these tests are irrelevant, please, ignore my
> > suggestion.
> >
> > CGS
> >
> >
> >
> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
> >
> >> I upped the memory to 6GB on each of the nodes and got exactly the same
> >> issue in the same time frame i.e. the increased RAM did not seem to by
> me
> >> any additional time.
> >>
> >> Mike
> >>
> >> -----Original Message-----
> >> From: Robert Newson [mailto:rnewson@apache.org]
> >> Sent: 12 April 2012 19:34
> >> To: user@couchdb.apache.org
> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
> >>
> >> 2GB total ram does sound tight. I can only compare to high volume
> >> production clusters which have much more ram than this. Given that
> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
> >> rest one? To couchjs processes, by chance? If so, you can reduce the
> >> maximum size of that pool in config, I think the default is 50.
> >>
> >> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
> >> > Ok, I have 3 nodes all load balanced with HAproxy:
> >> >
> >> > Centos 5.8 (Virtualised)
> >> > 2 Cores
> >> > 2GB RAM
> >> >
> >> > I'm trying to replicate about 75K documents which total 6GB when
> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told
> they
> >> are fairly large documents.
> >> >
> >> > When it goes pear shaped Vsmstat starts using a lot of memory:
> >> >
> >> > procs -----------memory---------- ---swap-- -----io---- --system--
> >> -----cpu------
> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy
> >> id wa st
> >> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1
>  6
> >>  2 91  0
> >> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1
>  5
> >>  9 85  0
> >> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1
>  7
> >>  1 91  0
> >> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1
> 10
> >>  4 85  0
> >> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13
>  7
> >> 33 47  0
> >> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17
>  8
> >> 49 26  0
> >> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25
>  9
> >> 61  4  0
> >> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8
>  4
> >> 49 40  0
> >> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4
>  2
> >> 50 44  0
> >> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9
>  2
> >> 50 40  0
> >> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22
> 20
> >> 36 23  0
> >> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3
> 22
> >>  0 75  0
> >> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
> >> 19 17 59  0
> >> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3
> 10
> >> 29 58  0
> >> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2
>  9
> >> 32 57  0
> >> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2
>  7
> >> 30 61  0
> >> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2
>  7
> >>  6 84  0
> >> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1
>  6
> >> 11 83  0
> >> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1
>  8
> >> 16 75  0
> >> >
> >> > It only ever takes out one node at a time and the other nodes seem to
> be
> >> doing very little while the one node is running out of memory.
> >> >
> >> > If I kick it off again it processed some more and then spikes the
> memory
> >> and fails
> >> >
> >> > Thanks
> >> >
> >> > Mike
> >> >
> >> > PS: hope you enjoyed you couchdb get together!
> >> >
> >> > -----Original Message-----
> >> > From: Robert Newson [mailto:rnewson@apache.org]
> >> > Sent: 12 April 2012 17:28
> >> > To: user@couchdb.apache.org
> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate
> memory
> >> >
> >> > What kind of load were you putting the machine on?
> >> >
> >> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
> >> >> Could you show your vm.args file?
> >> >>
> >> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
> >> >>> Unfortunately your request for help coincided with the two day
> CouchDB
> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
> >> >>> ways to get bigcouch support, but we happily answer queries here
> too,
> >> >>> when not at the Model UN of CouchDB. :D
> >> >>>
> >> >>> B.
> >> >>>
> >> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
> >> >>>> Looks like this isn't the right place based on the responses so
> far.
> >> Shame I hoped this was going to help solve our index/view rebuild times
> etc.
> >> >>>>
> >> >>>> Mike
> >> >>>>
> >> >>>> -----Original Message-----
> >> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
> >> >>>> Sent: 10 April 2012 09:20
> >> >>>> To: user@couchdb.apache.org
> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
> >> >>>>
> >> >>>> I'm not sure if this is the correct place to raise an issue I am
> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
> >> cluster? If this is not the correct place please point me in the right
> >> direction if it is then any one have any ideas why I keep getting the
> >> following error message when I kick of a replication;
> >> >>>>
> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
> >> "heap").
> >> >>>>
> >> >>>> My set-up is:
> >> >>>>
> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
> >> >>>>
> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
> >> local.ini overrides pulling from the Standalone couchdb (78K documents)
> >> >>>>
> >> >>>> [httpd]
> >> >>>> bind_address = XXX.XX.X.XX
> >> >>>>
> >> >>>> [cluster]
> >> >>>> ; number of shards for a new database
> >> >>>> q = 9
> >> >>>> ; number of copies of each shard
> >> >>>> n = 1
> >> >>>>
> >> >>>> [couchdb]
> >> >>>> database_dir = /other/bigcouch/database
> >> >>>> view_index_dir = /other/bigcouch/view
> >> >>>>
> >> >>>> The error is always generate on the third node in the cluster and
> the
> >> server basically max's out on memory before hand. The other nodes seem
> to
> >> be doing very little, but are getting data i.e. the shard sizes are
> >> growing. I've put the copies per shard down to 1 as currently I'm not
> >> interested in resilience.
> >> >>>>
> >> >>>> Any help would be greatly appreciated.
> >> >>>>
> >> >>>> Mike
> >> >>>>
> >>
>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

I think you should point out that "My idea behind these tests is that
it may be that your database may be
corrupted (or seen as corrupted by BigCouch at the second test) and what
you get is just garbage at a certain document. " is based on no
evidence. Nor, if it were true, would it necessarily explain the
observed behavior either.

It would be useful if we could all stick to asserting only things we
know to be true or have reasonable grounds to believe are true.
Unfounded speculation, though offered sincerely, is not helpful on a
mailing list intended to provide assistance.

Thanks,
B.

On 13 April 2012 13:55, CGS <cg...@gmail.com> wrote:
> Hi Mike,
>
> I haven't used BigCouch by now and that's why I haven't said anything by
> now. Still, giving a thought of what may occur there, I propose few tests
> if you have time:
> 1. Try to replicate the database in another CouchDB.
> 2. If 1 passes, try to replicate to only one node at the time.
> 3. If 2 passes, increase the pool of nodes with 1 and repeat the
> replication (for sure it will fail at all 3 nodes at the time).
>
> My idea behind these tests is that it may be that your database may be
> corrupted (or seen as corrupted by BigCouch at the second test) and what
> you get is just garbage at a certain document. That's why I proposed the
> first test. The second test is to see if any of the nodes has a problem in
> configuration (or if there is any incompatibility in between your CouchDB
> and BigCouch in manipulating your docs). Finally, the third test is to see
> if server/node resources limit the number of replications (and at how many
> it starts to fail).
>
> Can you also check the size of the shards at tests 2 and 3?
>
> If you consider that these tests are irrelevant, please, ignore my
> suggestion.
>
> CGS
>
>
>
> On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:
>
>> I upped the memory to 6GB on each of the nodes and got exactly the same
>> issue in the same time frame i.e. the increased RAM did not seem to by me
>> any additional time.
>>
>> Mike
>>
>> -----Original Message-----
>> From: Robert Newson [mailto:rnewson@apache.org]
>> Sent: 12 April 2012 19:34
>> To: user@couchdb.apache.org
>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>>
>> 2GB total ram does sound tight. I can only compare to high volume
>> production clusters which have much more ram than this. Given that
>> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
>> rest one? To couchjs processes, by chance? If so, you can reduce the
>> maximum size of that pool in config, I think the default is 50.
>>
>> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
>> > Ok, I have 3 nodes all load balanced with HAproxy:
>> >
>> > Centos 5.8 (Virtualised)
>> > 2 Cores
>> > 2GB RAM
>> >
>> > I'm trying to replicate about 75K documents which total 6GB when
>> compacted (0n Couchdb 1.2 which has compression turned on). I'm told they
>> are fairly large documents.
>> >
>> > When it goes pear shaped Vsmstat starts using a lot of memory:
>> >
>> > procs -----------memory---------- ---swap-- -----io---- --system--
>> -----cpu------
>> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id wa st
>> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1  6
>>  2 91  0
>> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1  5
>>  9 85  0
>> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1  7
>>  1 91  0
>> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1 10
>>  4 85  0
>> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13  7
>> 33 47  0
>> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17  8
>> 49 26  0
>> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25  9
>> 61  4  0
>> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8  4
>> 49 40  0
>> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4  2
>> 50 44  0
>> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9  2
>> 50 40  0
>> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22 20
>> 36 23  0
>> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3 22
>>  0 75  0
>> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
>> 19 17 59  0
>> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3 10
>> 29 58  0
>> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2  9
>> 32 57  0
>> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2  7
>> 30 61  0
>> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2  7
>>  6 84  0
>> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1  6
>> 11 83  0
>> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1  8
>> 16 75  0
>> >
>> > It only ever takes out one node at a time and the other nodes seem to be
>> doing very little while the one node is running out of memory.
>> >
>> > If I kick it off again it processed some more and then spikes the memory
>> and fails
>> >
>> > Thanks
>> >
>> > Mike
>> >
>> > PS: hope you enjoyed you couchdb get together!
>> >
>> > -----Original Message-----
>> > From: Robert Newson [mailto:rnewson@apache.org]
>> > Sent: 12 April 2012 17:28
>> > To: user@couchdb.apache.org
>> > Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>> >
>> > What kind of load were you putting the machine on?
>> >
>> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>> >> Could you show your vm.args file?
>> >>
>> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>> >>> Unfortunately your request for help coincided with the two day CouchDB
>> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>> >>> ways to get bigcouch support, but we happily answer queries here too,
>> >>> when not at the Model UN of CouchDB. :D
>> >>>
>> >>> B.
>> >>>
>> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>> >>>> Looks like this isn't the right place based on the responses so far.
>> Shame I hoped this was going to help solve our index/view rebuild times etc.
>> >>>>
>> >>>> Mike
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
>> >>>> Sent: 10 April 2012 09:20
>> >>>> To: user@couchdb.apache.org
>> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>> >>>>
>> >>>> I'm not sure if this is the correct place to raise an issue I am
>> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
>> cluster? If this is not the correct place please point me in the right
>> direction if it is then any one have any ideas why I keep getting the
>> following error message when I kick of a replication;
>> >>>>
>> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
>> "heap").
>> >>>>
>> >>>> My set-up is:
>> >>>>
>> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
>> >>>>
>> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
>> local.ini overrides pulling from the Standalone couchdb (78K documents)
>> >>>>
>> >>>> [httpd]
>> >>>> bind_address = XXX.XX.X.XX
>> >>>>
>> >>>> [cluster]
>> >>>> ; number of shards for a new database
>> >>>> q = 9
>> >>>> ; number of copies of each shard
>> >>>> n = 1
>> >>>>
>> >>>> [couchdb]
>> >>>> database_dir = /other/bigcouch/database
>> >>>> view_index_dir = /other/bigcouch/view
>> >>>>
>> >>>> The error is always generate on the third node in the cluster and the
>> server basically max's out on memory before hand. The other nodes seem to
>> be doing very little, but are getting data i.e. the shard sizes are
>> growing. I've put the copies per shard down to 1 as currently I'm not
>> interested in resilience.
>> >>>>
>> >>>> Any help would be greatly appreciated.
>> >>>>
>> >>>> Mike
>> >>>>
>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by CGS <cg...@gmail.com>.

Hi Mike,

I haven't used BigCouch by now and that's why I haven't said anything by
now. Still, giving a thought of what may occur there, I propose few tests
if you have time:
1. Try to replicate the database in another CouchDB.
2. If 1 passes, try to replicate to only one node at the time.
3. If 2 passes, increase the pool of nodes with 1 and repeat the
replication (for sure it will fail at all 3 nodes at the time).

My idea behind these tests is that it may be that your database may be
corrupted (or seen as corrupted by BigCouch at the second test) and what
you get is just garbage at a certain document. That's why I proposed the
first test. The second test is to see if any of the nodes has a problem in
configuration (or if there is any incompatibility in between your CouchDB
and BigCouch in manipulating your docs). Finally, the third test is to see
if server/node resources limit the number of replications (and at how many
it starts to fail).

Can you also check the size of the shards at tests 2 and 3?

If you consider that these tests are irrelevant, please, ignore my
suggestion.

CGS



On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <mk...@kana.com> wrote:

> I upped the memory to 6GB on each of the nodes and got exactly the same
> issue in the same time frame i.e. the increased RAM did not seem to by me
> any additional time.
>
> Mike
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 12 April 2012 19:34
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> 2GB total ram does sound tight. I can only compare to high volume
> production clusters which have much more ram than this. Given that
> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
> rest one? To couchjs processes, by chance? If so, you can reduce the
> maximum size of that pool in config, I think the default is 50.
>
> On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
> > Ok, I have 3 nodes all load balanced with HAproxy:
> >
> > Centos 5.8 (Virtualised)
> > 2 Cores
> > 2GB RAM
> >
> > I'm trying to replicate about 75K documents which total 6GB when
> compacted (0n Couchdb 1.2 which has compression turned on). I'm told they
> are fairly large documents.
> >
> > When it goes pear shaped Vsmstat starts using a lot of memory:
> >
> > procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id wa st
> >  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1  6
>  2 91  0
> >  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1  5
>  9 85  0
> >  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1  7
>  1 91  0
> >  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1 10
>  4 85  0
> >  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13  7
> 33 47  0
> >  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17  8
> 49 26  0
> >  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25  9
> 61  4  0
> >  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8  4
> 49 40  0
> >  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4  2
> 50 44  0
> >  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9  2
> 50 40  0
> >  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22 20
> 36 23  0
> >  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3 22
>  0 75  0
> >  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5
> 19 17 59  0
> >  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3 10
> 29 58  0
> >  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2  9
> 32 57  0
> >  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2  7
> 30 61  0
> >  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2  7
>  6 84  0
> >  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1  6
> 11 83  0
> >  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1  8
> 16 75  0
> >
> > It only ever takes out one node at a time and the other nodes seem to be
> doing very little while the one node is running out of memory.
> >
> > If I kick it off again it processed some more and then spikes the memory
> and fails
> >
> > Thanks
> >
> > Mike
> >
> > PS: hope you enjoyed you couchdb get together!
> >
> > -----Original Message-----
> > From: Robert Newson [mailto:rnewson@apache.org]
> > Sent: 12 April 2012 17:28
> > To: user@couchdb.apache.org
> > Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
> >
> > What kind of load were you putting the machine on?
> >
> > On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
> >> Could you show your vm.args file?
> >>
> >> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
> >>> Unfortunately your request for help coincided with the two day CouchDB
> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
> >>> ways to get bigcouch support, but we happily answer queries here too,
> >>> when not at the Model UN of CouchDB. :D
> >>>
> >>> B.
> >>>
> >>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
> >>>> Looks like this isn't the right place based on the responses so far.
> Shame I hoped this was going to help solve our index/view rebuild times etc.
> >>>>
> >>>> Mike
> >>>>
> >>>> -----Original Message-----
> >>>> From: Mike Kimber [mailto:mkimber@kana.com]
> >>>> Sent: 10 April 2012 09:20
> >>>> To: user@couchdb.apache.org
> >>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
> >>>>
> >>>> I'm not sure if this is the correct place to raise an issue I am
> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch
> cluster? If this is not the correct place please point me in the right
> direction if it is then any one have any ideas why I keep getting the
> following error message when I kick of a replication;
> >>>>
> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type
> "heap").
> >>>>
> >>>> My set-up is:
> >>>>
> >>>> Standalone couchdb 1.1.1 running on Centos 5.7
> >>>>
> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following
> local.ini overrides pulling from the Standalone couchdb (78K documents)
> >>>>
> >>>> [httpd]
> >>>> bind_address = XXX.XX.X.XX
> >>>>
> >>>> [cluster]
> >>>> ; number of shards for a new database
> >>>> q = 9
> >>>> ; number of copies of each shard
> >>>> n = 1
> >>>>
> >>>> [couchdb]
> >>>> database_dir = /other/bigcouch/database
> >>>> view_index_dir = /other/bigcouch/view
> >>>>
> >>>> The error is always generate on the third node in the cluster and the
> server basically max's out on memory before hand. The other nodes seem to
> be doing very little, but are getting data i.e. the shard sizes are
> growing. I've put the copies per shard down to 1 as currently I'm not
> interested in resilience.
> >>>>
> >>>> Any help would be greatly appreciated.
> >>>>
> >>>> Mike
> >>>>
>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

I upped the memory to 6GB on each of the nodes and got exactly the same issue in the same time frame i.e. the increased RAM did not seem to by me any additional time.

Mike 

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org] 
Sent: 12 April 2012 19:34
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

2GB total ram does sound tight. I can only compare to high volume
production clusters which have much more ram than this. Given that
beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
rest one? To couchjs processes, by chance? If so, you can reduce the
maximum size of that pool in config, I think the default is 50.

On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
> Ok, I have 3 nodes all load balanced with HAproxy:
>
> Centos 5.8 (Virtualised)
> 2 Cores
> 2GB RAM
>
> I'm trying to replicate about 75K documents which total 6GB when compacted (0n Couchdb 1.2 which has compression turned on). I'm told they are fairly large documents.
>
> When it goes pear shaped Vsmstat starts using a lot of memory:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1  6  2 91  0
>  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1  5  9 85  0
>  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1  7  1 91  0
>  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1 10  4 85  0
>  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13  7 33 47  0
>  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17  8 49 26  0
>  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25  9 61  4  0
>  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8  4 49 40  0
>  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4  2 50 44  0
>  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9  2 50 40  0
>  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22 20 36 23  0
>  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3 22  0 75  0
>  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5 19 17 59  0
>  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3 10 29 58  0
>  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2  9 32 57  0
>  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2  7 30 61  0
>  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2  7  6 84  0
>  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1  6 11 83  0
>  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1  8 16 75  0
>
> It only ever takes out one node at a time and the other nodes seem to be doing very little while the one node is running out of memory.
>
> If I kick it off again it processed some more and then spikes the memory and fails
>
> Thanks
>
> Mike
>
> PS: hope you enjoyed you couchdb get together!
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 12 April 2012 17:28
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> What kind of load were you putting the machine on?
>
> On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>> Could you show your vm.args file?
>>
>> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>> Unfortunately your request for help coincided with the two day CouchDB
>>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>> ways to get bigcouch support, but we happily answer queries here too,
>>> when not at the Model UN of CouchDB. :D
>>>
>>> B.
>>>
>>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>>>
>>>> Mike
>>>>
>>>> -----Original Message-----
>>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>> Sent: 10 April 2012 09:20
>>>> To: user@couchdb.apache.org
>>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>>
>>>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>>>
>>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>>>
>>>> My set-up is:
>>>>
>>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>>
>>>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>>
>>>> [httpd]
>>>> bind_address = XXX.XX.X.XX
>>>>
>>>> [cluster]
>>>> ; number of shards for a new database
>>>> q = 9
>>>> ; number of copies of each shard
>>>> n = 1
>>>>
>>>> [couchdb]
>>>> database_dir = /other/bigcouch/database
>>>> view_index_dir = /other/bigcouch/view
>>>>
>>>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>>>
>>>> Any help would be greatly appreciated.
>>>>
>>>> Mike
>>>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

2GB total ram does sound tight. I can only compare to high volume
production clusters which have much more ram than this. Given that
beam.smp wanted 1.4 gb and you have 2gb total, do you know where the
rest one? To couchjs processes, by chance? If so, you can reduce the
maximum size of that pool in config, I think the default is 50.

On 12 April 2012 18:32, Mike Kimber <mk...@kana.com> wrote:
> Ok, I have 3 nodes all load balanced with HAproxy:
>
> Centos 5.8 (Virtualised)
> 2 Cores
> 2GB RAM
>
> I'm trying to replicate about 75K documents which total 6GB when compacted (0n Couchdb 1.2 which has compression turned on). I'm told they are fairly large documents.
>
> When it goes pear shaped Vsmstat starts using a lot of memory:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1  6  2 91  0
>  0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1  5  9 85  0
>  1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1  7  1 91  0
>  0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1 10  4 85  0
>  1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13  7 33 47  0
>  1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17  8 49 26  0
>  0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25  9 61  4  0
>  0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8  4 49 40  0
>  0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4  2 50 44  0
>  0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9  2 50 40  0
>  0  3 444000   8712    444  12876  299  368   331   380 1294  188 22 20 36 23  0
>  0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3 22  0 75  0
>  1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5 19 17 59  0
>  0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3 10 29 58  0
>  0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2  9 32 57  0
>  0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2  7 30 61  0
>  0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2  7  6 84  0
>  0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1  6 11 83  0
>  2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1  8 16 75  0
>
> It only ever takes out one node at a time and the other nodes seem to be doing very little while the one node is running out of memory.
>
> If I kick it off again it processed some more and then spikes the memory and fails
>
> Thanks
>
> Mike
>
> PS: hope you enjoyed you couchdb get together!
>
> -----Original Message-----
> From: Robert Newson [mailto:rnewson@apache.org]
> Sent: 12 April 2012 17:28
> To: user@couchdb.apache.org
> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory
>
> What kind of load were you putting the machine on?
>
> On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
>> Could you show your vm.args file?
>>
>> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>>> Unfortunately your request for help coincided with the two day CouchDB
>>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>>> ways to get bigcouch support, but we happily answer queries here too,
>>> when not at the Model UN of CouchDB. :D
>>>
>>> B.
>>>
>>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>>>
>>>> Mike
>>>>
>>>> -----Original Message-----
>>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>>> Sent: 10 April 2012 09:20
>>>> To: user@couchdb.apache.org
>>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>>
>>>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>>>
>>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>>>
>>>> My set-up is:
>>>>
>>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>>
>>>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>>
>>>> [httpd]
>>>> bind_address = XXX.XX.X.XX
>>>>
>>>> [cluster]
>>>> ; number of shards for a new database
>>>> q = 9
>>>> ; number of copies of each shard
>>>> n = 1
>>>>
>>>> [couchdb]
>>>> database_dir = /other/bigcouch/database
>>>> view_index_dir = /other/bigcouch/view
>>>>
>>>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>>>
>>>> Any help would be greatly appreciated.
>>>>
>>>> Mike
>>>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

Ok, I have 3 nodes all load balanced with HAproxy:

Centos 5.8 (Virtualised)
2 Cores 
2GB RAM

I'm trying to replicate about 75K documents which total 6GB when compacted (0n Couchdb 1.2 which has compression turned on). I'm told they are fairly large documents.

When it goes pear shaped Vsmstat starts using a lot of memory:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  2 570576   8808    140   7208 2998 2249  3154  2249 1234  569  1  6  2 91  0
 0  2 569656   9156    156   7504 2330 1899  2405  1904 1246  595  1  5  9 85  0
 1  1 575412   9516    236  14928 1549 2261  3242  2261 1237  593  1  7  1 91  0
 0  2 607092  13220    168   8156 3772 9012  3871  9017 1284  714  1 10  4 85  0
 1  0 444336 857004    220  10212 5781    0  6202     0 1574 1010 13  7 33 47  0
 1  0 442176 870684    428  11052 2049    0  2208   140 2561 1541 17  8 49 26  0
 0  0 442176 813140    460  11968  170    0   348     0 2672 1565 25  9 61  4  0
 0  1 442176 744972    484  12224 5440    0  5493     7 2432  900  8  4 49 40  0
 0  1 442176 714048    484  12296 4547    0  4547     0 1799  827  4  2 50 44  0
 0  1 442176 686304    496  12688 5128    0  5222     0 1696  999  9  2 50 40  0
 0  3 444000   8712    444  12876  299  368   331   380 1294  188 22 20 36 23  0
 0  3 469340  10040    116   7336   29 5087    74  5090 1232  268  3 22  0 75  0
 1  2 584356  10220    124   6744 11367 28722 11370 28722 1643 1300  5 19 17 59  0
 0  1 624908  10640    132   7036 6518 12879  6590 12884 1296  717  3 10 29 58  0
 0  2 652556  10948    252  14776 3799 9494  5459  9494 1294  646  2  9 32 57  0
 0  2 677784  10648    244  14528 3819 8196  3819  8201 1274  588  2  7 30 61  0
 0  2 688460   9512    212   8224 3013 4522  3125  4522 1379  519  2  7  6 84  0
 0  3 699164   9888    208   8468 2192 4014  2228  4014 1302  495  1  6 11 83  0
 2  0 713104   9004    144   9192 2606 4490  2848  4490 1350  487  1  8 16 75  0

It only ever takes out one node at a time and the other nodes seem to be doing very little while the one node is running out of memory.

If I kick it off again it processed some more and then spikes the memory and fails

Thanks 

Mike 

PS: hope you enjoyed you couchdb get together!

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org] 
Sent: 12 April 2012 17:28
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

What kind of load were you putting the machine on?

On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
> Could you show your vm.args file?
>
> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>> Unfortunately your request for help coincided with the two day CouchDB
>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>> ways to get bigcouch support, but we happily answer queries here too,
>> when not at the Model UN of CouchDB. :D
>>
>> B.
>>
>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>>
>>> Mike
>>>
>>> -----Original Message-----
>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>> Sent: 10 April 2012 09:20
>>> To: user@couchdb.apache.org
>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>
>>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>>
>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>>
>>> My set-up is:
>>>
>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>
>>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>
>>> [httpd]
>>> bind_address = XXX.XX.X.XX
>>>
>>> [cluster]
>>> ; number of shards for a new database
>>> q = 9
>>> ; number of copies of each shard
>>> n = 1
>>>
>>> [couchdb]
>>> database_dir = /other/bigcouch/database
>>> view_index_dir = /other/bigcouch/view
>>>
>>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>>
>>> Any help would be greatly appreciated.
>>>
>>> Mike
>>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

What kind of load were you putting the machine on?

On 12 April 2012 17:24, Robert Newson <rn...@apache.org> wrote:
> Could you show your vm.args file?
>
> On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
>> Unfortunately your request for help coincided with the two day CouchDB
>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
>> ways to get bigcouch support, but we happily answer queries here too,
>> when not at the Model UN of CouchDB. :D
>>
>> B.
>>
>> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>>
>>> Mike
>>>
>>> -----Original Message-----
>>> From: Mike Kimber [mailto:mkimber@kana.com]
>>> Sent: 10 April 2012 09:20
>>> To: user@couchdb.apache.org
>>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>>
>>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>>
>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>>
>>> My set-up is:
>>>
>>> Standalone couchdb 1.1.1 running on Centos 5.7
>>>
>>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>>
>>> [httpd]
>>> bind_address = XXX.XX.X.XX
>>>
>>> [cluster]
>>> ; number of shards for a new database
>>> q = 9
>>> ; number of copies of each shard
>>> n = 1
>>>
>>> [couchdb]
>>> database_dir = /other/bigcouch/database
>>> view_index_dir = /other/bigcouch/view
>>>
>>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>>
>>> Any help would be greatly appreciated.
>>>
>>> Mike
>>>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

Var.args looks like (its the default):

# Each node in the system must have a unique name.  A name can be short
# (specified using -sname) or it can by fully qualified (-name).  There can be
# no communication between nodes running with the -sname flag and those running
# with the -name flag.
-name bigcouch

# All nodes must share the same magic cookie for distributed Erlang to work.
# Comment out this line if you synchronized the cookies by other means (using
# the ~/.erlang.cookie file, for example).
-setcookie monster

# Tell SASL not to log progress reports
-sasl errlog_type error

# Use kernel poll functionality if supported by emulator
+K true

# Start a pool of asynchronous IO threads
+A 16

# Comment this line out to enable the interactive Erlang shell on startup
+Bd -noinput

Mike 

-----Original Message-----
From: Robert Newson [mailto:rnewson@apache.org] 
Sent: 12 April 2012 17:25
To: user@couchdb.apache.org
Subject: Re: BigCouch - Replication failing with Cannot Allocate memory

Could you show your vm.args file?

On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
> Unfortunately your request for help coincided with the two day CouchDB
> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
> ways to get bigcouch support, but we happily answer queries here too,
> when not at the Model UN of CouchDB. :D
>
> B.
>
> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>
>> Mike
>>
>> -----Original Message-----
>> From: Mike Kimber [mailto:mkimber@kana.com]
>> Sent: 10 April 2012 09:20
>> To: user@couchdb.apache.org
>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>
>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>
>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>
>> My set-up is:
>>
>> Standalone couchdb 1.1.1 running on Centos 5.7
>>
>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>
>> [httpd]
>> bind_address = XXX.XX.X.XX
>>
>> [cluster]
>> ; number of shards for a new database
>> q = 9
>> ; number of copies of each shard
>> n = 1
>>
>> [couchdb]
>> database_dir = /other/bigcouch/database
>> view_index_dir = /other/bigcouch/view
>>
>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>
>> Any help would be greatly appreciated.
>>
>> Mike
>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

Could you show your vm.args file?

On 12 April 2012 17:23, Robert Newson <rn...@apache.org> wrote:
> Unfortunately your request for help coincided with the two day CouchDB
> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
> ways to get bigcouch support, but we happily answer queries here too,
> when not at the Model UN of CouchDB. :D
>
> B.
>
> On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
>> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>>
>> Mike
>>
>> -----Original Message-----
>> From: Mike Kimber [mailto:mkimber@kana.com]
>> Sent: 10 April 2012 09:20
>> To: user@couchdb.apache.org
>> Subject: BigCouch - Replication failing with Cannot Allocate memory
>>
>> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>>
>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>>
>> My set-up is:
>>
>> Standalone couchdb 1.1.1 running on Centos 5.7
>>
>> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>>
>> [httpd]
>> bind_address = XXX.XX.X.XX
>>
>> [cluster]
>> ; number of shards for a new database
>> q = 9
>> ; number of copies of each shard
>> n = 1
>>
>> [couchdb]
>> database_dir = /other/bigcouch/database
>> view_index_dir = /other/bigcouch/view
>>
>> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>>
>> Any help would be greatly appreciated.
>>
>> Mike
>>

Re: BigCouch - Replication failing with Cannot Allocate memory

Posted by Robert Newson <rn...@apache.org>.

Unfortunately your request for help coincided with the two day CouchDB
Summit. #cloudant and the Issues tab on cloudant/bigcouch are other
ways to get bigcouch support, but we happily answer queries here too,
when not at the Model UN of CouchDB. :D

B.

On 12 April 2012 17:10, Mike Kimber <mk...@kana.com> wrote:
> Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.
>
> Mike
>
> -----Original Message-----
> From: Mike Kimber [mailto:mkimber@kana.com]
> Sent: 10 April 2012 09:20
> To: user@couchdb.apache.org
> Subject: BigCouch - Replication failing with Cannot Allocate memory
>
> I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;
>
> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").
>
> My set-up is:
>
> Standalone couchdb 1.1.1 running on Centos 5.7
>
> 3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)
>
> [httpd]
> bind_address = XXX.XX.X.XX
>
> [cluster]
> ; number of shards for a new database
> q = 9
> ; number of copies of each shard
> n = 1
>
> [couchdb]
> database_dir = /other/bigcouch/database
> view_index_dir = /other/bigcouch/view
>
> The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.
>
> Any help would be greatly appreciated.
>
> Mike
>

RE: BigCouch - Replication failing with Cannot Allocate memory

Posted by Mike Kimber <mk...@kana.com>.

Looks like this isn't the right place based on the responses so far. Shame I hoped this was going to help solve our index/view rebuild times etc.

Mike

-----Original Message-----
From: Mike Kimber [mailto:mkimber@kana.com] 
Sent: 10 April 2012 09:20
To: user@couchdb.apache.org
Subject: BigCouch - Replication failing with Cannot Allocate memory

I'm not sure if this is the correct place to raise an issue I am having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch cluster? If this is not the correct place please point me in the right direction if it is then any one have any ideas why I keep getting the following error message when I kick of a replication;

eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "heap").

My set-up is:

Standalone couchdb 1.1.1 running on Centos 5.7

3 Node BigCouch cluster running on Centos 5.8 with the following local.ini overrides pulling from the Standalone couchdb (78K documents)

[httpd]
bind_address = XXX.XX.X.XX

[cluster]
; number of shards for a new database
q = 9
; number of copies of each shard
n = 1

[couchdb]
database_dir = /other/bigcouch/database
view_index_dir = /other/bigcouch/view

The error is always generate on the third node in the cluster and the server basically max's out on memory before hand. The other nodes seem to be doing very little, but are getting data i.e. the shard sizes are growing. I've put the copies per shard down to 1 as currently I'm not interested in resilience.

Any help would be greatly appreciated.

Mike