You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Vladimir Ralev <vl...@gmail.com> on 2014/01/28 22:49:45 UTC

bigcouch/couchdb file descriptor leak during compaction

Hi guys,

I am monitoring a huge compaction right now and keeping an eye of the
system not to fail with emfile error.

The number of file descriptor owned by couch is growing very fast. I can do:

lsof -p CouchPID

and get tons of these:

beam.smp 21853 root *152u   REG  254,2        8306 30670917
/opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd (deleted)

beam.smp 21853 root *153u   REG  254,2        8282 30671071
/opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9 (deleted)

beam.smp 21853 root *154u   REG  254,2        8372 30670973
/opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b (deleted)

beam.smp 21853 root *155u   REG  254,2        8297 30671201
/opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338 (deleted)

beam.smp 21853 root *156u   REG  254,2        8294 30670974
/opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1 (deleted)

beam.smp 21853 root *157u   REG  254,2        8294 30671159
/opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4 (deleted)

beam.smp 21853 root *158u   REG  254,2        8294 30670975
/opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e (deleted)


Currently they are at 55000 such descriptors and they are never released.
Note that these files are indeed deleted, but the system doesn't release
the handles. I can't find a reference to similar problems. Is this a known
issue i should watch out for?


I suppose i can restart the system in between compactions to release the
files, but if you have any other advice its highly appreciated.

Re: bigcouch/couchdb file descriptor leak during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.
https://github.com/cloudant/bigcouch/tree/f0f5a107c0b895dd72187c10baedec24b85329a9



On 31 Jan 2014, at 10:13, Vladimir Ralev <vl...@gmail.com> wrote:

> I tried to rule out a file system problem and I did these:
> 
> chmod -R 777 /opt/bigcouch
> chown -R root /opt/bigcouch
> 
> Then ran the bigcouch as root.
> 
> I still have a leak, but it's for other files:
> 
> beam.smp  28679        root *086u      REG              254,2      8282
> 32250112
> /opt/bigcouch/var/lib/.shards/80000000-9fffffff/db1/1a/3a/3e9b00e2e5e72df737bf30cd24ad.1376585909_design/dfa1fd4be3aecb20848cad2feb20e00a.view
> 
> beam.smp  28679        root *087u      REG              254,2      8285
> 32246907
> /opt/bigcouch/var/lib/.shards/80000000-9fffffff/db1/07/d7/e18bfed619a6078c2b19fef66b2c.1371491971_design/dfa1fd4be3aecb20848cad2feb20e00a.view
> 
> 
> The deleted files don't leak anymore. And there are no errors on the
> bigcouch log. As far as I can tell this happens on all compactions. Even if
> I pace them slowly. The machine runs out of memory eventually (because the
> system limits are really high).
> 
> Can somebody point me to the source code of the couchdb used in bigcouch
> 0.4.2, I will add some extra logs here and there see if I can figure it out?
> 
> 
> 
> 
> 
> On Wed, Jan 29, 2014 at 2:22 AM, Robert Samuel Newson <rn...@apache.org>wrote:
> 
>> 
>> Yes, anything moved to the .delete directory is fair game for deletion.
>> 
>> CouchDB and BigCouch move the file there and then delete it. This is so,
>> in the event of a crash, only one directory needs to be cleaned up rather
>> than potentially expensive recursive sweep.
>> 
>> As for why BigCouch fails to release your files, I don't know. Is this
>> happening for *all* compactions or is it quite rare but has accumulated
>> over a long period of time?
>> 
>> The difference between 0.4.0 and 0.4.2 is small and there's nothing that
>> would induce this issue afaik.
>> 
>> B.
>> 
>> On 28 Jan 2014, at 23:02, Vladimir Ralev <vl...@gmail.com> wrote:
>> 
>>> Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:8:8]
>> [async-threads:0]
>>> [hipe] [kernel-poll:false]
>>> 
>>> Bigcouch is 0.4.2 latest.
>>> {"couchdb":"Welcome","version":"1.1.1","bigcouch":"0.4.2"}
>>> I haven't found anything unusual about the disk/OS/FS, all Debian
>> defaults.
>>> But I will keep looking. I will try to look into the source code, is it
>>> safe to assume all these deleted files are deleted by the compaction code
>>> and not some other part of the system?
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Jan 29, 2014 at 12:23 AM, Robert Samuel Newson
>>> <rn...@apache.org>wrote:
>>> 
>>>> 
>>>> 
>>>> What version of erlang is this? There are some to avoid, R14B02 being
>> the
>>>> most notable.
>>>> 
>>>> curious that you have so many of these, anything odd about filesystem or
>>>> disk?
>>>> 
>>>> All that said, a restart is your only method of freeing these if
>> bigcouch
>>>> (0.4.0 I assume?) is not able to.
>>>> 
>>>> B
>>>> 
>>>> On 28 Jan 2014, at 21:49, Vladimir Ralev <vl...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I am monitoring a huge compaction right now and keeping an eye of the
>>>>> system not to fail with emfile error.
>>>>> 
>>>>> The number of file descriptor owned by couch is growing very fast. I
>> can
>>>> do:
>>>>> 
>>>>> lsof -p CouchPID
>>>>> 
>>>>> and get tons of these:
>>>>> 
>>>>> beam.smp 21853 root *152u   REG  254,2        8306 30670917
>>>>> /opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *153u   REG  254,2        8282 30671071
>>>>> /opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *154u   REG  254,2        8372 30670973
>>>>> /opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *155u   REG  254,2        8297 30671201
>>>>> /opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *156u   REG  254,2        8294 30670974
>>>>> /opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *157u   REG  254,2        8294 30671159
>>>>> /opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *158u   REG  254,2        8294 30670975
>>>>> /opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e
>> (deleted)
>>>>> 
>>>>> 
>>>>> Currently they are at 55000 such descriptors and they are never
>> released.
>>>>> Note that these files are indeed deleted, but the system doesn't
>> release
>>>>> the handles. I can't find a reference to similar problems. Is this a
>>>> known
>>>>> issue i should watch out for?
>>>>> 
>>>>> 
>>>>> I suppose i can restart the system in between compactions to release
>> the
>>>>> files, but if you have any other advice its highly appreciated.
>>>> 
>>>> 
>> 
>> 


Re: bigcouch/couchdb file descriptor leak during compaction

Posted by Vladimir Ralev <vl...@gmail.com>.
I tried to rule out a file system problem and I did these:

chmod -R 777 /opt/bigcouch
chown -R root /opt/bigcouch

Then ran the bigcouch as root.

I still have a leak, but it's for other files:

beam.smp  28679        root *086u      REG              254,2      8282
32250112
/opt/bigcouch/var/lib/.shards/80000000-9fffffff/db1/1a/3a/3e9b00e2e5e72df737bf30cd24ad.1376585909_design/dfa1fd4be3aecb20848cad2feb20e00a.view

beam.smp  28679        root *087u      REG              254,2      8285
32246907
/opt/bigcouch/var/lib/.shards/80000000-9fffffff/db1/07/d7/e18bfed619a6078c2b19fef66b2c.1371491971_design/dfa1fd4be3aecb20848cad2feb20e00a.view


The deleted files don't leak anymore. And there are no errors on the
bigcouch log. As far as I can tell this happens on all compactions. Even if
I pace them slowly. The machine runs out of memory eventually (because the
system limits are really high).

Can somebody point me to the source code of the couchdb used in bigcouch
0.4.2, I will add some extra logs here and there see if I can figure it out?





On Wed, Jan 29, 2014 at 2:22 AM, Robert Samuel Newson <rn...@apache.org>wrote:

>
> Yes, anything moved to the .delete directory is fair game for deletion.
>
> CouchDB and BigCouch move the file there and then delete it. This is so,
> in the event of a crash, only one directory needs to be cleaned up rather
> than potentially expensive recursive sweep.
>
> As for why BigCouch fails to release your files, I don't know. Is this
> happening for *all* compactions or is it quite rare but has accumulated
> over a long period of time?
>
> The difference between 0.4.0 and 0.4.2 is small and there's nothing that
> would induce this issue afaik.
>
> B.
>
> On 28 Jan 2014, at 23:02, Vladimir Ralev <vl...@gmail.com> wrote:
>
> > Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:8:8]
> [async-threads:0]
> > [hipe] [kernel-poll:false]
> >
> > Bigcouch is 0.4.2 latest.
> > {"couchdb":"Welcome","version":"1.1.1","bigcouch":"0.4.2"}
> > I haven't found anything unusual about the disk/OS/FS, all Debian
> defaults.
> > But I will keep looking. I will try to look into the source code, is it
> > safe to assume all these deleted files are deleted by the compaction code
> > and not some other part of the system?
> >
> >
> >
> >
> > On Wed, Jan 29, 2014 at 12:23 AM, Robert Samuel Newson
> > <rn...@apache.org>wrote:
> >
> >>
> >>
> >> What version of erlang is this? There are some to avoid, R14B02 being
> the
> >> most notable.
> >>
> >> curious that you have so many of these, anything odd about filesystem or
> >> disk?
> >>
> >> All that said, a restart is your only method of freeing these if
> bigcouch
> >> (0.4.0 I assume?) is not able to.
> >>
> >> B
> >>
> >> On 28 Jan 2014, at 21:49, Vladimir Ralev <vl...@gmail.com>
> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> I am monitoring a huge compaction right now and keeping an eye of the
> >>> system not to fail with emfile error.
> >>>
> >>> The number of file descriptor owned by couch is growing very fast. I
> can
> >> do:
> >>>
> >>> lsof -p CouchPID
> >>>
> >>> and get tons of these:
> >>>
> >>> beam.smp 21853 root *152u   REG  254,2        8306 30670917
> >>> /opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd
> (deleted)
> >>>
> >>> beam.smp 21853 root *153u   REG  254,2        8282 30671071
> >>> /opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9
> (deleted)
> >>>
> >>> beam.smp 21853 root *154u   REG  254,2        8372 30670973
> >>> /opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b
> (deleted)
> >>>
> >>> beam.smp 21853 root *155u   REG  254,2        8297 30671201
> >>> /opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338
> (deleted)
> >>>
> >>> beam.smp 21853 root *156u   REG  254,2        8294 30670974
> >>> /opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1
> (deleted)
> >>>
> >>> beam.smp 21853 root *157u   REG  254,2        8294 30671159
> >>> /opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4
> (deleted)
> >>>
> >>> beam.smp 21853 root *158u   REG  254,2        8294 30670975
> >>> /opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e
> (deleted)
> >>>
> >>>
> >>> Currently they are at 55000 such descriptors and they are never
> released.
> >>> Note that these files are indeed deleted, but the system doesn't
> release
> >>> the handles. I can't find a reference to similar problems. Is this a
> >> known
> >>> issue i should watch out for?
> >>>
> >>>
> >>> I suppose i can restart the system in between compactions to release
> the
> >>> files, but if you have any other advice its highly appreciated.
> >>
> >>
>
>

Re: bigcouch/couchdb file descriptor leak during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.
Yes, anything moved to the .delete directory is fair game for deletion.

CouchDB and BigCouch move the file there and then delete it. This is so, in the event of a crash, only one directory needs to be cleaned up rather than potentially expensive recursive sweep.

As for why BigCouch fails to release your files, I don’t know. Is this happening for *all* compactions or is it quite rare but has accumulated over a long period of time?

The difference between 0.4.0 and 0.4.2 is small and there’s nothing that would induce this issue afaik.

B.

On 28 Jan 2014, at 23:02, Vladimir Ralev <vl...@gmail.com> wrote:

> Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:8:8] [async-threads:0]
> [hipe] [kernel-poll:false]
> 
> Bigcouch is 0.4.2 latest.
> {"couchdb":"Welcome","version":"1.1.1","bigcouch":"0.4.2"}
> I haven't found anything unusual about the disk/OS/FS, all Debian defaults.
> But I will keep looking. I will try to look into the source code, is it
> safe to assume all these deleted files are deleted by the compaction code
> and not some other part of the system?
> 
> 
> 
> 
> On Wed, Jan 29, 2014 at 12:23 AM, Robert Samuel Newson
> <rn...@apache.org>wrote:
> 
>> 
>> 
>> What version of erlang is this? There are some to avoid, R14B02 being the
>> most notable.
>> 
>> curious that you have so many of these, anything odd about filesystem or
>> disk?
>> 
>> All that said, a restart is your only method of freeing these if bigcouch
>> (0.4.0 I assume?) is not able to.
>> 
>> B
>> 
>> On 28 Jan 2014, at 21:49, Vladimir Ralev <vl...@gmail.com> wrote:
>> 
>>> Hi guys,
>>> 
>>> I am monitoring a huge compaction right now and keeping an eye of the
>>> system not to fail with emfile error.
>>> 
>>> The number of file descriptor owned by couch is growing very fast. I can
>> do:
>>> 
>>> lsof -p CouchPID
>>> 
>>> and get tons of these:
>>> 
>>> beam.smp 21853 root *152u   REG  254,2        8306 30670917
>>> /opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd (deleted)
>>> 
>>> beam.smp 21853 root *153u   REG  254,2        8282 30671071
>>> /opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9 (deleted)
>>> 
>>> beam.smp 21853 root *154u   REG  254,2        8372 30670973
>>> /opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b (deleted)
>>> 
>>> beam.smp 21853 root *155u   REG  254,2        8297 30671201
>>> /opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338 (deleted)
>>> 
>>> beam.smp 21853 root *156u   REG  254,2        8294 30670974
>>> /opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1 (deleted)
>>> 
>>> beam.smp 21853 root *157u   REG  254,2        8294 30671159
>>> /opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4 (deleted)
>>> 
>>> beam.smp 21853 root *158u   REG  254,2        8294 30670975
>>> /opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e (deleted)
>>> 
>>> 
>>> Currently they are at 55000 such descriptors and they are never released.
>>> Note that these files are indeed deleted, but the system doesn't release
>>> the handles. I can't find a reference to similar problems. Is this a
>> known
>>> issue i should watch out for?
>>> 
>>> 
>>> I suppose i can restart the system in between compactions to release the
>>> files, but if you have any other advice its highly appreciated.
>> 
>> 


Re: bigcouch/couchdb file descriptor leak during compaction

Posted by Vladimir Ralev <vl...@gmail.com>.
Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:8:8] [async-threads:0]
[hipe] [kernel-poll:false]

Bigcouch is 0.4.2 latest.
{"couchdb":"Welcome","version":"1.1.1","bigcouch":"0.4.2"}
I haven't found anything unusual about the disk/OS/FS, all Debian defaults.
But I will keep looking. I will try to look into the source code, is it
safe to assume all these deleted files are deleted by the compaction code
and not some other part of the system?




On Wed, Jan 29, 2014 at 12:23 AM, Robert Samuel Newson
<rn...@apache.org>wrote:

>
>
> What version of erlang is this? There are some to avoid, R14B02 being the
> most notable.
>
> curious that you have so many of these, anything odd about filesystem or
> disk?
>
> All that said, a restart is your only method of freeing these if bigcouch
> (0.4.0 I assume?) is not able to.
>
> B
>
> On 28 Jan 2014, at 21:49, Vladimir Ralev <vl...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I am monitoring a huge compaction right now and keeping an eye of the
> > system not to fail with emfile error.
> >
> > The number of file descriptor owned by couch is growing very fast. I can
> do:
> >
> > lsof -p CouchPID
> >
> > and get tons of these:
> >
> > beam.smp 21853 root *152u   REG  254,2        8306 30670917
> > /opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd (deleted)
> >
> > beam.smp 21853 root *153u   REG  254,2        8282 30671071
> > /opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9 (deleted)
> >
> > beam.smp 21853 root *154u   REG  254,2        8372 30670973
> > /opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b (deleted)
> >
> > beam.smp 21853 root *155u   REG  254,2        8297 30671201
> > /opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338 (deleted)
> >
> > beam.smp 21853 root *156u   REG  254,2        8294 30670974
> > /opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1 (deleted)
> >
> > beam.smp 21853 root *157u   REG  254,2        8294 30671159
> > /opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4 (deleted)
> >
> > beam.smp 21853 root *158u   REG  254,2        8294 30670975
> > /opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e (deleted)
> >
> >
> > Currently they are at 55000 such descriptors and they are never released.
> > Note that these files are indeed deleted, but the system doesn't release
> > the handles. I can't find a reference to similar problems. Is this a
> known
> > issue i should watch out for?
> >
> >
> > I suppose i can restart the system in between compactions to release the
> > files, but if you have any other advice its highly appreciated.
>
>

Re: bigcouch/couchdb file descriptor leak during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

What version of erlang is this? There are some to avoid, R14B02 being the most notable.

curious that you have so many of these, anything odd about filesystem or disk?

All that said, a restart is your only method of freeing these if bigcouch (0.4.0 I assume?) is not able to.

B

On 28 Jan 2014, at 21:49, Vladimir Ralev <vl...@gmail.com> wrote:

> Hi guys,
> 
> I am monitoring a huge compaction right now and keeping an eye of the
> system not to fail with emfile error.
> 
> The number of file descriptor owned by couch is growing very fast. I can do:
> 
> lsof -p CouchPID
> 
> and get tons of these:
> 
> beam.smp 21853 root *152u   REG  254,2        8306 30670917
> /opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd (deleted)
> 
> beam.smp 21853 root *153u   REG  254,2        8282 30671071
> /opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9 (deleted)
> 
> beam.smp 21853 root *154u   REG  254,2        8372 30670973
> /opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b (deleted)
> 
> beam.smp 21853 root *155u   REG  254,2        8297 30671201
> /opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338 (deleted)
> 
> beam.smp 21853 root *156u   REG  254,2        8294 30670974
> /opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1 (deleted)
> 
> beam.smp 21853 root *157u   REG  254,2        8294 30671159
> /opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4 (deleted)
> 
> beam.smp 21853 root *158u   REG  254,2        8294 30670975
> /opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e (deleted)
> 
> 
> Currently they are at 55000 such descriptors and they are never released.
> Note that these files are indeed deleted, but the system doesn't release
> the handles. I can't find a reference to similar problems. Is this a known
> issue i should watch out for?
> 
> 
> I suppose i can restart the system in between compactions to release the
> files, but if you have any other advice its highly appreciated.