You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Vladimir Ralev <vl...@gmail.com> on 2014/01/31 17:09:24 UTC

_cleanup_view is deleting all views

Hi guys,

bigcouch 0.4.2 has the following code that handles view cleanup:

cleanup_index_files(Db) ->

    % load all ddocs

    {ok, DesignDocs} = couch_db:get_design_docs(Db),


    % make unique list of group sigs

    Sigs = lists:map(fun(#doc{id = GroupId}) ->

        {ok, Info} = get_group_info(Db, GroupId),

        ?b2l(couch_util:get_value(signature, Info))

    end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),


    FileList = list_index_files(Db),


    DeleteFiles =

    if length(Sigs) =:= 0 ->

        FileList;

    true ->

        % regex that matches all ddocs

        RegExp = "("++ string:join(Sigs, "|") ++")",


    % filter out the ones in use

        [FilePath || FilePath <- FileList,

            re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]

    end,


    % delete unused files

    ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),

    RootDir = couch_config:get("couchdb", "view_index_dir"),

    [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],

    ok.


>From here
https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84

It's supposed to delete only unused views, but in my case it deletes
everything and then starts building from scratch. Can you help me
understand the condition used here to filter the files that are currently
in use? How is the regex supposed to work.

Re: _cleanup_view is deleting all views

Posted by Robert Samuel Newson <rn...@apache.org>.
1. is it designed behavior (true for couchdb also).

2. again, designed behavior. They’ll be closed if the database closes, which happens when you exceed the size of the LRU.

3. oooh. that’s a mistake on your part, and it explains it all. You must _view_cleanup on :5984, the clustering code has to be invoked as the design document could be on a remote node, the logic to calculate shard filenames to retain, etc.

B.

On 3 Feb 2014, at 14:39, Vladimir Ralev <vl...@gmail.com> wrote:

> Hello again.
> 
> I did some more testing and here are some observations. I am analyzing this
> from the perspective of running 1000 databases, with 30 views each.
> Bigcouch will partition the databases into smaller DBS, about 10000
> databases in total per machine. Each of these will have 30 views. And 300K+
> files in the directory structure total, per machine.
> 
> What I see in a smaller scale test is the following
> 1. Initially the views are not generated, only when you access the view
> http://host:5984/aea8b710ab5f0/desgn/etc.. then the view files is built
> from scratch.
> 2. Once you access the view file this way, the file handles to this file
> are kept open forever from the beam.smp process. Never closes until the
> bigcouch is restarted. The couchjs process terminates and releases the
> handle while indexing.
> 3. If you run
> http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
> views are deleted, always
> 4. If you run http://host:5984/aea8b710ab5f0/_view_cleanup the views are
> NOT deleted, I guess that's the correct clean up I should use
> 5. If you restart bigcouch to force the file handle to close, and make no
> read request to that view (to open the file handle), the bigcouch will
> slowly start to open files and never close them again until next time.
> 6. When you delete files with
> http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
> erlanf file:delete is used which doesn't care about file handles, it
> just deleted by name, thus the deleted files remain referenced and the
> handle is preserved to be seen in lsof. The cycle of deleting and
> rebuilding these files never stops and the descriptors leak.
> 
> Do these observations make sense?
> 
> I think 300K+ handles is manageable as long as it doesn't recycle
> constantly, but I need to understand the correct _view_cleanup REST API to
> use. Is http://host:5984 sufficient?
> 
> I added some logs on file close and so on and it's mostly called on db
> files. I couldn't trace it to any point to release a view file handle, if
> you can point me to the code which may release it, I can check.
> 
> Thanks a lot for any feedback.
> 
> 
> On Sat, Feb 1, 2014 at 6:41 AM, Vladimir Ralev <vl...@gmail.com>wrote:
> 
>> Not sure at all. I don't know how to check precisely if a live design doc
>> is pointing to a particular file. I was basing my statement off the fact
>> that I have my views declared and they were available pre-indexed before
>> compaction (but they were not physically opened as file handles by couch,
>> but they were opened on demand). Once I finish my current script, I will
>> test everything again and will spend some time tracing the code.
>> 
>> 
>> On Fri, Jan 31, 2014 at 6:52 PM, Robert Samuel Newson <rn...@apache.org>wrote:
>> 
>>> 
>>> Ownership is interesting. Would the bigcouch user have the right to
>>> delete the file but not open it for reading?
>>> 
>>> There's definitely an issue in bigcouch (fixed long since in couchdb)
>>> where any failure to open a view file makes us delete it.
>>> 
>>> OS/fs all check out fine, You see the filename that should be retained in
>>> that log output? you're 100% sure? You do have a live design doc pointing
>>> to it?
>>> 
>>> B.
>>> 
>>> On 31 Jan 2014, at 16:39, Vladimir Ralev <vl...@gmail.com>
>>> wrote:
>>> 
>>>> Thanks a lot. The database was moved from older machines so some other
>>> file
>>>> system metadata might be scrambled. But I don't see what can cause a
>>>> problem like this.
>>>> 
>>>> Yes the debug output is seen "deleting unused view index files:" and it
>>>> deletes every view in every database, little doubt about it. It doesn't
>>>> delete fresh views though that are fully regenerated afterwards. I think
>>>> the original views somehow got corrupted, but I need to figure out why
>>> and
>>>> may be fix it manually with a script
>>>> 
>>>> OS is Debian 64, file system is ext4, there is a little scramble of the
>>>> file ownership, some directories are owned by old bigcouch user, others
>>> by
>>>> root, so that's one thing I am investigating. I reset the ownership, but
>>>> will have to repeat it for my next tests.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <
>>> rnewson@apache.org>wrote:
>>>> 
>>>>> and details of OS, filesystem, anything you think might be relevant.
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 31 Jan 2014, at 16:20, Robert Samuel Newson <rn...@apache.org>
>>> wrote:
>>>>> 
>>>>>> First thing to note is that bigcouch development is over, but we can
>>> at
>>>>> least confirm this;
>>>>>> 
>>>>>> This function fetches all the design docs of the database, grabs all
>>> the
>>>>> signatures from each (you'll have noticed view filenames look
>>> uuid/randomy,
>>>>> that's a 'sig'), and then sweeps the dir where all views for the given
>>>>> database should be and deletes those not in the 'keep' list.
>>>>>> 
>>>>>> Can you enable debug level logging (curl
>>>>> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
>>>>> nodes) and tell us if ;
>>>>>> 
>>>>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>>>> 
>>>>>> actually gets printed?
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> bigcouch 0.4.2 has the following code that handles view cleanup:
>>>>>>> 
>>>>>>> cleanup_index_files(Db) ->
>>>>>>> 
>>>>>>> % load all ddocs
>>>>>>> 
>>>>>>> {ok, DesignDocs} = couch_db:get_design_docs(Db),
>>>>>>> 
>>>>>>> 
>>>>>>> % make unique list of group sigs
>>>>>>> 
>>>>>>> Sigs = lists:map(fun(#doc{id = GroupId}) ->
>>>>>>> 
>>>>>>>     {ok, Info} = get_group_info(Db, GroupId),
>>>>>>> 
>>>>>>>     ?b2l(couch_util:get_value(signature, Info))
>>>>>>> 
>>>>>>> end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
>>>>>>> 
>>>>>>> 
>>>>>>> FileList = list_index_files(Db),
>>>>>>> 
>>>>>>> 
>>>>>>> DeleteFiles =
>>>>>>> 
>>>>>>> if length(Sigs) =:= 0 ->
>>>>>>> 
>>>>>>>     FileList;
>>>>>>> 
>>>>>>> true ->
>>>>>>> 
>>>>>>>     % regex that matches all ddocs
>>>>>>> 
>>>>>>>     RegExp = "("++ string:join(Sigs, "|") ++")",
>>>>>>> 
>>>>>>> 
>>>>>>> % filter out the ones in use
>>>>>>> 
>>>>>>>     [FilePath || FilePath <- FileList,
>>>>>>> 
>>>>>>>         re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
>>>>>>> 
>>>>>>> end,
>>>>>>> 
>>>>>>> 
>>>>>>> % delete unused files
>>>>>>> 
>>>>>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>>>>> 
>>>>>>> RootDir = couch_config:get("couchdb", "view_index_dir"),
>>>>>>> 
>>>>>>> [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
>>>>>>> 
>>>>>>> ok.
>>>>>>> 
>>>>>>> 
>>>>>>> From here
>>>>>>> 
>>>>> 
>>> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
>>>>>>> 
>>>>>>> It's supposed to delete only unused views, but in my case it deletes
>>>>>>> everything and then starts building from scratch. Can you help me
>>>>>>> understand the condition used here to filter the files that are
>>>>> currently
>>>>>>> in use? How is the regex supposed to work.
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: _cleanup_view is deleting all views

Posted by Vladimir Ralev <vl...@gmail.com>.
Hello again.

I did some more testing and here are some observations. I am analyzing this
from the perspective of running 1000 databases, with 30 views each.
Bigcouch will partition the databases into smaller DBS, about 10000
databases in total per machine. Each of these will have 30 views. And 300K+
files in the directory structure total, per machine.

What I see in a smaller scale test is the following
1. Initially the views are not generated, only when you access the view
http://host:5984/aea8b710ab5f0/desgn/etc.. then the view files is built
from scratch.
2. Once you access the view file this way, the file handles to this file
are kept open forever from the beam.smp process. Never closes until the
bigcouch is restarted. The couchjs process terminates and releases the
handle while indexing.
3. If you run
http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
views are deleted, always
4. If you run http://host:5984/aea8b710ab5f0/_view_cleanup the views are
NOT deleted, I guess that's the correct clean up I should use
5. If you restart bigcouch to force the file handle to close, and make no
read request to that view (to open the file handle), the bigcouch will
slowly start to open files and never close them again until next time.
6. When you delete files with
http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
erlanf file:delete is used which doesn't care about file handles, it
just deleted by name, thus the deleted files remain referenced and the
handle is preserved to be seen in lsof. The cycle of deleting and
rebuilding these files never stops and the descriptors leak.

Do these observations make sense?

I think 300K+ handles is manageable as long as it doesn't recycle
constantly, but I need to understand the correct _view_cleanup REST API to
use. Is http://host:5984 sufficient?

I added some logs on file close and so on and it's mostly called on db
files. I couldn't trace it to any point to release a view file handle, if
you can point me to the code which may release it, I can check.

Thanks a lot for any feedback.


On Sat, Feb 1, 2014 at 6:41 AM, Vladimir Ralev <vl...@gmail.com>wrote:

> Not sure at all. I don't know how to check precisely if a live design doc
> is pointing to a particular file. I was basing my statement off the fact
> that I have my views declared and they were available pre-indexed before
> compaction (but they were not physically opened as file handles by couch,
> but they were opened on demand). Once I finish my current script, I will
> test everything again and will spend some time tracing the code.
>
>
> On Fri, Jan 31, 2014 at 6:52 PM, Robert Samuel Newson <rn...@apache.org>wrote:
>
>>
>> Ownership is interesting. Would the bigcouch user have the right to
>> delete the file but not open it for reading?
>>
>> There's definitely an issue in bigcouch (fixed long since in couchdb)
>> where any failure to open a view file makes us delete it.
>>
>> OS/fs all check out fine, You see the filename that should be retained in
>> that log output? you're 100% sure? You do have a live design doc pointing
>> to it?
>>
>> B.
>>
>> On 31 Jan 2014, at 16:39, Vladimir Ralev <vl...@gmail.com>
>> wrote:
>>
>> > Thanks a lot. The database was moved from older machines so some other
>> file
>> > system metadata might be scrambled. But I don't see what can cause a
>> > problem like this.
>> >
>> > Yes the debug output is seen "deleting unused view index files:" and it
>> > deletes every view in every database, little doubt about it. It doesn't
>> > delete fresh views though that are fully regenerated afterwards. I think
>> > the original views somehow got corrupted, but I need to figure out why
>> and
>> > may be fix it manually with a script
>> >
>> > OS is Debian 64, file system is ext4, there is a little scramble of the
>> > file ownership, some directories are owned by old bigcouch user, others
>> by
>> > root, so that's one thing I am investigating. I reset the ownership, but
>> > will have to repeat it for my next tests.
>> >
>> >
>> >
>> >
>> > On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <
>> rnewson@apache.org>wrote:
>> >
>> >> and details of OS, filesystem, anything you think might be relevant.
>> >>
>> >> B.
>> >>
>> >> On 31 Jan 2014, at 16:20, Robert Samuel Newson <rn...@apache.org>
>> wrote:
>> >>
>> >>> First thing to note is that bigcouch development is over, but we can
>> at
>> >> least confirm this;
>> >>>
>> >>> This function fetches all the design docs of the database, grabs all
>> the
>> >> signatures from each (you'll have noticed view filenames look
>> uuid/randomy,
>> >> that's a 'sig'), and then sweeps the dir where all views for the given
>> >> database should be and deletes those not in the 'keep' list.
>> >>>
>> >>> Can you enable debug level logging (curl
>> >> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
>> >> nodes) and tell us if ;
>> >>>
>> >>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>> >>>
>> >>> actually gets printed?
>> >>>
>> >>> B.
>> >>>
>> >>> On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com>
>> >> wrote:
>> >>>
>> >>>> Hi guys,
>> >>>>
>> >>>> bigcouch 0.4.2 has the following code that handles view cleanup:
>> >>>>
>> >>>> cleanup_index_files(Db) ->
>> >>>>
>> >>>>  % load all ddocs
>> >>>>
>> >>>>  {ok, DesignDocs} = couch_db:get_design_docs(Db),
>> >>>>
>> >>>>
>> >>>>  % make unique list of group sigs
>> >>>>
>> >>>>  Sigs = lists:map(fun(#doc{id = GroupId}) ->
>> >>>>
>> >>>>      {ok, Info} = get_group_info(Db, GroupId),
>> >>>>
>> >>>>      ?b2l(couch_util:get_value(signature, Info))
>> >>>>
>> >>>>  end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
>> >>>>
>> >>>>
>> >>>>  FileList = list_index_files(Db),
>> >>>>
>> >>>>
>> >>>>  DeleteFiles =
>> >>>>
>> >>>>  if length(Sigs) =:= 0 ->
>> >>>>
>> >>>>      FileList;
>> >>>>
>> >>>>  true ->
>> >>>>
>> >>>>      % regex that matches all ddocs
>> >>>>
>> >>>>      RegExp = "("++ string:join(Sigs, "|") ++")",
>> >>>>
>> >>>>
>> >>>>  % filter out the ones in use
>> >>>>
>> >>>>      [FilePath || FilePath <- FileList,
>> >>>>
>> >>>>          re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
>> >>>>
>> >>>>  end,
>> >>>>
>> >>>>
>> >>>>  % delete unused files
>> >>>>
>> >>>>  ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>> >>>>
>> >>>>  RootDir = couch_config:get("couchdb", "view_index_dir"),
>> >>>>
>> >>>>  [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
>> >>>>
>> >>>>  ok.
>> >>>>
>> >>>>
>> >>>> From here
>> >>>>
>> >>
>> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
>> >>>>
>> >>>> It's supposed to delete only unused views, but in my case it deletes
>> >>>> everything and then starts building from scratch. Can you help me
>> >>>> understand the condition used here to filter the files that are
>> >> currently
>> >>>> in use? How is the regex supposed to work.
>> >>>
>> >>
>> >>
>>
>>
>

Re: _cleanup_view is deleting all views

Posted by Vladimir Ralev <vl...@gmail.com>.
Not sure at all. I don't know how to check precisely if a live design doc
is pointing to a particular file. I was basing my statement off the fact
that I have my views declared and they were available pre-indexed before
compaction (but they were not physically opened as file handles by couch,
but they were opened on demand). Once I finish my current script, I will
test everything again and will spend some time tracing the code.


On Fri, Jan 31, 2014 at 6:52 PM, Robert Samuel Newson <rn...@apache.org>wrote:

>
> Ownership is interesting. Would the bigcouch user have the right to delete
> the file but not open it for reading?
>
> There's definitely an issue in bigcouch (fixed long since in couchdb)
> where any failure to open a view file makes us delete it.
>
> OS/fs all check out fine, You see the filename that should be retained in
> that log output? you're 100% sure? You do have a live design doc pointing
> to it?
>
> B.
>
> On 31 Jan 2014, at 16:39, Vladimir Ralev <vl...@gmail.com> wrote:
>
> > Thanks a lot. The database was moved from older machines so some other
> file
> > system metadata might be scrambled. But I don't see what can cause a
> > problem like this.
> >
> > Yes the debug output is seen "deleting unused view index files:" and it
> > deletes every view in every database, little doubt about it. It doesn't
> > delete fresh views though that are fully regenerated afterwards. I think
> > the original views somehow got corrupted, but I need to figure out why
> and
> > may be fix it manually with a script
> >
> > OS is Debian 64, file system is ext4, there is a little scramble of the
> > file ownership, some directories are owned by old bigcouch user, others
> by
> > root, so that's one thing I am investigating. I reset the ownership, but
> > will have to repeat it for my next tests.
> >
> >
> >
> >
> > On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <
> rnewson@apache.org>wrote:
> >
> >> and details of OS, filesystem, anything you think might be relevant.
> >>
> >> B.
> >>
> >> On 31 Jan 2014, at 16:20, Robert Samuel Newson <rn...@apache.org>
> wrote:
> >>
> >>> First thing to note is that bigcouch development is over, but we can at
> >> least confirm this;
> >>>
> >>> This function fetches all the design docs of the database, grabs all
> the
> >> signatures from each (you'll have noticed view filenames look
> uuid/randomy,
> >> that's a 'sig'), and then sweeps the dir where all views for the given
> >> database should be and deletes those not in the 'keep' list.
> >>>
> >>> Can you enable debug level logging (curl
> >> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
> >> nodes) and tell us if ;
> >>>
> >>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
> >>>
> >>> actually gets printed?
> >>>
> >>> B.
> >>>
> >>> On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi guys,
> >>>>
> >>>> bigcouch 0.4.2 has the following code that handles view cleanup:
> >>>>
> >>>> cleanup_index_files(Db) ->
> >>>>
> >>>>  % load all ddocs
> >>>>
> >>>>  {ok, DesignDocs} = couch_db:get_design_docs(Db),
> >>>>
> >>>>
> >>>>  % make unique list of group sigs
> >>>>
> >>>>  Sigs = lists:map(fun(#doc{id = GroupId}) ->
> >>>>
> >>>>      {ok, Info} = get_group_info(Db, GroupId),
> >>>>
> >>>>      ?b2l(couch_util:get_value(signature, Info))
> >>>>
> >>>>  end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
> >>>>
> >>>>
> >>>>  FileList = list_index_files(Db),
> >>>>
> >>>>
> >>>>  DeleteFiles =
> >>>>
> >>>>  if length(Sigs) =:= 0 ->
> >>>>
> >>>>      FileList;
> >>>>
> >>>>  true ->
> >>>>
> >>>>      % regex that matches all ddocs
> >>>>
> >>>>      RegExp = "("++ string:join(Sigs, "|") ++")",
> >>>>
> >>>>
> >>>>  % filter out the ones in use
> >>>>
> >>>>      [FilePath || FilePath <- FileList,
> >>>>
> >>>>          re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
> >>>>
> >>>>  end,
> >>>>
> >>>>
> >>>>  % delete unused files
> >>>>
> >>>>  ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
> >>>>
> >>>>  RootDir = couch_config:get("couchdb", "view_index_dir"),
> >>>>
> >>>>  [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
> >>>>
> >>>>  ok.
> >>>>
> >>>>
> >>>> From here
> >>>>
> >>
> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
> >>>>
> >>>> It's supposed to delete only unused views, but in my case it deletes
> >>>> everything and then starts building from scratch. Can you help me
> >>>> understand the condition used here to filter the files that are
> >> currently
> >>>> in use? How is the regex supposed to work.
> >>>
> >>
> >>
>
>

Re: _cleanup_view is deleting all views

Posted by Robert Samuel Newson <rn...@apache.org>.
Ownership is interesting. Would the bigcouch user have the right to delete the file but not open it for reading?

There’s definitely an issue in bigcouch (fixed long since in couchdb) where any failure to open a view file makes us delete it.

OS/fs all check out fine, You see the filename that should be retained in that log output? you’re 100% sure? You do have a live design doc pointing to it?

B.

On 31 Jan 2014, at 16:39, Vladimir Ralev <vl...@gmail.com> wrote:

> Thanks a lot. The database was moved from older machines so some other file
> system metadata might be scrambled. But I don't see what can cause a
> problem like this.
> 
> Yes the debug output is seen "deleting unused view index files:" and it
> deletes every view in every database, little doubt about it. It doesn't
> delete fresh views though that are fully regenerated afterwards. I think
> the original views somehow got corrupted, but I need to figure out why and
> may be fix it manually with a script
> 
> OS is Debian 64, file system is ext4, there is a little scramble of the
> file ownership, some directories are owned by old bigcouch user, others by
> root, so that's one thing I am investigating. I reset the ownership, but
> will have to repeat it for my next tests.
> 
> 
> 
> 
> On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <rn...@apache.org>wrote:
> 
>> and details of OS, filesystem, anything you think might be relevant.
>> 
>> B.
>> 
>> On 31 Jan 2014, at 16:20, Robert Samuel Newson <rn...@apache.org> wrote:
>> 
>>> First thing to note is that bigcouch development is over, but we can at
>> least confirm this;
>>> 
>>> This function fetches all the design docs of the database, grabs all the
>> signatures from each (you'll have noticed view filenames look uuid/randomy,
>> that's a 'sig'), and then sweeps the dir where all views for the given
>> database should be and deletes those not in the 'keep' list.
>>> 
>>> Can you enable debug level logging (curl
>> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
>> nodes) and tell us if ;
>>> 
>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>> 
>>> actually gets printed?
>>> 
>>> B.
>>> 
>>> On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com>
>> wrote:
>>> 
>>>> Hi guys,
>>>> 
>>>> bigcouch 0.4.2 has the following code that handles view cleanup:
>>>> 
>>>> cleanup_index_files(Db) ->
>>>> 
>>>>  % load all ddocs
>>>> 
>>>>  {ok, DesignDocs} = couch_db:get_design_docs(Db),
>>>> 
>>>> 
>>>>  % make unique list of group sigs
>>>> 
>>>>  Sigs = lists:map(fun(#doc{id = GroupId}) ->
>>>> 
>>>>      {ok, Info} = get_group_info(Db, GroupId),
>>>> 
>>>>      ?b2l(couch_util:get_value(signature, Info))
>>>> 
>>>>  end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
>>>> 
>>>> 
>>>>  FileList = list_index_files(Db),
>>>> 
>>>> 
>>>>  DeleteFiles =
>>>> 
>>>>  if length(Sigs) =:= 0 ->
>>>> 
>>>>      FileList;
>>>> 
>>>>  true ->
>>>> 
>>>>      % regex that matches all ddocs
>>>> 
>>>>      RegExp = "("++ string:join(Sigs, "|") ++")",
>>>> 
>>>> 
>>>>  % filter out the ones in use
>>>> 
>>>>      [FilePath || FilePath <- FileList,
>>>> 
>>>>          re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
>>>> 
>>>>  end,
>>>> 
>>>> 
>>>>  % delete unused files
>>>> 
>>>>  ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>> 
>>>>  RootDir = couch_config:get("couchdb", "view_index_dir"),
>>>> 
>>>>  [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
>>>> 
>>>>  ok.
>>>> 
>>>> 
>>>> From here
>>>> 
>> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
>>>> 
>>>> It's supposed to delete only unused views, but in my case it deletes
>>>> everything and then starts building from scratch. Can you help me
>>>> understand the condition used here to filter the files that are
>> currently
>>>> in use? How is the regex supposed to work.
>>> 
>> 
>> 


Re: _cleanup_view is deleting all views

Posted by Vladimir Ralev <vl...@gmail.com>.
Thanks a lot. The database was moved from older machines so some other file
system metadata might be scrambled. But I don't see what can cause a
problem like this.

Yes the debug output is seen "deleting unused view index files:" and it
deletes every view in every database, little doubt about it. It doesn't
delete fresh views though that are fully regenerated afterwards. I think
the original views somehow got corrupted, but I need to figure out why and
may be fix it manually with a script

OS is Debian 64, file system is ext4, there is a little scramble of the
file ownership, some directories are owned by old bigcouch user, others by
root, so that's one thing I am investigating. I reset the ownership, but
will have to repeat it for my next tests.




On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <rn...@apache.org>wrote:

> and details of OS, filesystem, anything you think might be relevant.
>
> B.
>
> On 31 Jan 2014, at 16:20, Robert Samuel Newson <rn...@apache.org> wrote:
>
> > First thing to note is that bigcouch development is over, but we can at
> least confirm this;
> >
> > This function fetches all the design docs of the database, grabs all the
> signatures from each (you'll have noticed view filenames look uuid/randomy,
> that's a 'sig'), and then sweeps the dir where all views for the given
> database should be and deletes those not in the 'keep' list.
> >
> > Can you enable debug level logging (curl
> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
> nodes) and tell us if ;
> >
> > ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
> >
> > actually gets printed?
> >
> > B.
> >
> > On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com>
> wrote:
> >
> >> Hi guys,
> >>
> >> bigcouch 0.4.2 has the following code that handles view cleanup:
> >>
> >> cleanup_index_files(Db) ->
> >>
> >>   % load all ddocs
> >>
> >>   {ok, DesignDocs} = couch_db:get_design_docs(Db),
> >>
> >>
> >>   % make unique list of group sigs
> >>
> >>   Sigs = lists:map(fun(#doc{id = GroupId}) ->
> >>
> >>       {ok, Info} = get_group_info(Db, GroupId),
> >>
> >>       ?b2l(couch_util:get_value(signature, Info))
> >>
> >>   end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
> >>
> >>
> >>   FileList = list_index_files(Db),
> >>
> >>
> >>   DeleteFiles =
> >>
> >>   if length(Sigs) =:= 0 ->
> >>
> >>       FileList;
> >>
> >>   true ->
> >>
> >>       % regex that matches all ddocs
> >>
> >>       RegExp = "("++ string:join(Sigs, "|") ++")",
> >>
> >>
> >>   % filter out the ones in use
> >>
> >>       [FilePath || FilePath <- FileList,
> >>
> >>           re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
> >>
> >>   end,
> >>
> >>
> >>   % delete unused files
> >>
> >>   ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
> >>
> >>   RootDir = couch_config:get("couchdb", "view_index_dir"),
> >>
> >>   [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
> >>
> >>   ok.
> >>
> >>
> >> From here
> >>
> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
> >>
> >> It's supposed to delete only unused views, but in my case it deletes
> >> everything and then starts building from scratch. Can you help me
> >> understand the condition used here to filter the files that are
> currently
> >> in use? How is the regex supposed to work.
> >
>
>

Re: _cleanup_view is deleting all views

Posted by Robert Samuel Newson <rn...@apache.org>.
and details of OS, filesystem, anything you think might be relevant.

B.

On 31 Jan 2014, at 16:20, Robert Samuel Newson <rn...@apache.org> wrote:

> First thing to note is that bigcouch development is over, but we can at least confirm this;
> 
> This function fetches all the design docs of the database, grabs all the signatures from each (you’ll have noticed view filenames look uuid/randomy, that’s a 'sig'), and then sweeps the dir where all views for the given database should be and deletes those not in the 'keep' list.
> 
> Can you enable debug level logging (curl localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch nodes) and tell us if ;
> 
> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
> 
> actually gets printed?
> 
> B.
> 
> On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com> wrote:
> 
>> Hi guys,
>> 
>> bigcouch 0.4.2 has the following code that handles view cleanup:
>> 
>> cleanup_index_files(Db) ->
>> 
>>   % load all ddocs
>> 
>>   {ok, DesignDocs} = couch_db:get_design_docs(Db),
>> 
>> 
>>   % make unique list of group sigs
>> 
>>   Sigs = lists:map(fun(#doc{id = GroupId}) ->
>> 
>>       {ok, Info} = get_group_info(Db, GroupId),
>> 
>>       ?b2l(couch_util:get_value(signature, Info))
>> 
>>   end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
>> 
>> 
>>   FileList = list_index_files(Db),
>> 
>> 
>>   DeleteFiles =
>> 
>>   if length(Sigs) =:= 0 ->
>> 
>>       FileList;
>> 
>>   true ->
>> 
>>       % regex that matches all ddocs
>> 
>>       RegExp = "("++ string:join(Sigs, "|") ++")",
>> 
>> 
>>   % filter out the ones in use
>> 
>>       [FilePath || FilePath <- FileList,
>> 
>>           re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
>> 
>>   end,
>> 
>> 
>>   % delete unused files
>> 
>>   ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>> 
>>   RootDir = couch_config:get("couchdb", "view_index_dir"),
>> 
>>   [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
>> 
>>   ok.
>> 
>> 
>> From here
>> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
>> 
>> It's supposed to delete only unused views, but in my case it deletes
>> everything and then starts building from scratch. Can you help me
>> understand the condition used here to filter the files that are currently
>> in use? How is the regex supposed to work.
> 


Re: _cleanup_view is deleting all views

Posted by Robert Samuel Newson <rn...@apache.org>.
First thing to note is that bigcouch development is over, but we can at least confirm this;

This function fetches all the design docs of the database, grabs all the signatures from each (you’ll have noticed view filenames look uuid/randomy, that’s a 'sig'), and then sweeps the dir where all views for the given database should be and deletes those not in the 'keep' list.

Can you enable debug level logging (curl localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch nodes) and tell us if ;

?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),

actually gets printed?

B.

On 31 Jan 2014, at 16:09, Vladimir Ralev <vl...@gmail.com> wrote:

> Hi guys,
> 
> bigcouch 0.4.2 has the following code that handles view cleanup:
> 
> cleanup_index_files(Db) ->
> 
>    % load all ddocs
> 
>    {ok, DesignDocs} = couch_db:get_design_docs(Db),
> 
> 
>    % make unique list of group sigs
> 
>    Sigs = lists:map(fun(#doc{id = GroupId}) ->
> 
>        {ok, Info} = get_group_info(Db, GroupId),
> 
>        ?b2l(couch_util:get_value(signature, Info))
> 
>    end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
> 
> 
>    FileList = list_index_files(Db),
> 
> 
>    DeleteFiles =
> 
>    if length(Sigs) =:= 0 ->
> 
>        FileList;
> 
>    true ->
> 
>        % regex that matches all ddocs
> 
>        RegExp = "("++ string:join(Sigs, "|") ++")",
> 
> 
>    % filter out the ones in use
> 
>        [FilePath || FilePath <- FileList,
> 
>            re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
> 
>    end,
> 
> 
>    % delete unused files
> 
>    ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
> 
>    RootDir = couch_config:get("couchdb", "view_index_dir"),
> 
>    [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
> 
>    ok.
> 
> 
> From here
> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
> 
> It's supposed to delete only unused views, but in my case it deletes
> everything and then starts building from scratch. Can you help me
> understand the condition used here to filter the files that are currently
> in use? How is the regex supposed to work.