You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Glenn Rempe <gl...@rempe.us> on 2009/10/03 18:10:41 UTC

Timeout Error when trying to access views + Indexing problems

Hello all,
I am looking for some guidance on how I can eliminate an error I am seeing
when trying to access views, and help with getting through indexing a large
design document.

Yesterday I upgraded to a trunk install of CouchDB (0.11.0b) in an attempt
to resolve my second problem (see below). I have a DB that currently has
about 16 million records in it and I am in the midst of importing more up to
a total of about 26 million.  Yesterday when I would try to access one of my
map/reduce views I would see the indexing process kick off in the Futon
status page and I would see the couchjs process in 'top'.  But today, if I
try to access any view I see the following error from CouchDB within about 3
seconds from requesting any view:

http://pastie.org/640511

The first few lines of it are:

Error: timeout{gen_server,call,
    [couch_view,
     {get_group_server,<<"searchlight_production">>,
         {group,
             <<95,25,15,251,46,213,137,116,110,135,150,210,66,56,105,172>>,
             nil,nil,<<"_design/SearchDocument">>,<<"javascript">>,[],
             [{view,0,


I have tried without success restarting the CouchDB several times.

Any thoughts as to what might be happening here and how I might prevent it?

Related to this is my second problem.  Whenever I have tried to index a view
of this large DB the indexing process seems to silently die out after a
while and it never get through indexing the whole DB.  I have seen it get
through 10's of thousands up to a few million docs before dying (out of
millions).  Questions:

- Is there a recommended method to figure out what is happening in the
internals of the indexing that may be causing it to fail?
- If indexing fails before having gone through the entire result set at
least once does it continue where it left off at the last crash?  Or does it
need to start the whole indexing process over from scratch?
- How can I best ensure that my large DB gets fully indexed?

Thank you for the help.

Glenn

-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Brian Candler <B....@pobox.com>.

On Sun, Oct 04, 2009 at 11:29:29AM -0700, Glenn Rempe wrote:
>    Thanks for the reply Brian. ? This particular part of the issue (slow
>    bulk post of new records) was resolved earlier this morning with a
>    suggestion from Chris Anderson in this thread who suggested the
>    following ini change which I made in futon:
> 
>    [uuids]
>    algorithm = sequential
> 
>    This totally resolved the slow bulk updates for me (instantly!).

OK, cool.

I've been generating sequential uuids client-side for a long time for other
reasons (basically so that items with equal keys in a view are shown in
insertion order)

However I'm surprised it makes a factor of 30 difference. I've done
benchmarking tests with inserting 1000 documents at a time, where I didn't
provide uuids client-side, and performance was just fine.

>    on a quad core 15GB RAM EC2 X-Large server storing the data on
>    an EBS volume. ? Huge difference.

Ah. Possibly EBS is the key factor here. With local disk it's cheap to have
a lot of round trips to the local VFS cache (cheap), followed by single
flush to disk. But perhaps EBS bypasses the VFS cache? That is, maybe each
block is being sent and received over the wire multiple times, e.g. if it's
being read, updated and rewritten repeatedly?

A tcpdump might shed some light on this.

Regards,

Brian.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Brian Candler <B....@pobox.com>.

On Sat, Oct 03, 2009 at 11:27:54PM -0700, Glenn Rempe wrote:
> And just another data point.  I was seeing timeouts from my CouchRest client
> when trying to do bulk saves to this large DB from my migration script.
> You can see that I find a batch of 1000 records to migrate and bulk save
> them to CouchDB.  I write back a timestamp to MySQL for each of those
> records.  So you can see it takes less than 10 seconds to update 1000 docs
> in mysql, and almost a minute for the http request to return 201 success for
> the bulk insert of that date into couchdb.  Avg record size is a few hundred
> bytes.

That sounds wrong. I use ancient hardware (1.2GHz P4 laptop) and a bulk
insert of 1200 small documents takes around 2 seconds.

Note that this can't be anything to do with indexing, if you're just talking
about the time from HTTP POST to HTTP response, because indexes are only
updated when you read a view (unless you have some other process in the
background making separate accesses to views?)

Unfortunately, I can't give you any clue as to what's wrong on your
platform.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Thanks Chris.  I'll try that.
Right now I am migrating data from MySQL, while migrating I have been
letting couchdb choose its own UUID's.

My intention for new records coming in, once I switch the code to start
writing to couchdb, was to set the UUID myself based on the UUID of the
incoming SQS (Simple Queue Service) UUID.  The reason being this would help
prevent receiving and processing duplicate SQS messages which can sometimes
happen with SQS (guarantee 'at least once' delivery).

Is this approach inadvisable?  Should I instead stick to sequential UUID's
from couchdb as a more performant solution?

PS - I am still digging into the indexing problems I was having against the
large db.  I'll report more later.

Thanks,

Glenn


On Sat, Oct 3, 2009 at 11:40 PM, Chris Anderson <jc...@apache.org> wrote:

> On Sat, Oct 3, 2009 at 11:27 PM, Glenn Rempe <gl...@rempe.us> wrote:
> > And just another data point.  I was seeing timeouts from my CouchRest
> client
> > when trying to do bulk saves to this large DB from my migration script.
> > You can see that I find a batch of 1000 records to migrate and bulk save
> > them to CouchDB.  I write back a timestamp to MySQL for each of those
> > records.  So you can see it takes less than 10 seconds to update 1000
> docs
> > in mysql, and almost a minute for the http request to return 201 success
> for
> > the bulk insert of that date into couchdb.  Avg record size is a few
> hundred
> > bytes.
> >
> > I ended up having to fork mattetti CouchRest so that I could set an
> explicit
> > timeout for the embedded RestClient to 120 seconds (!) in order for it to
> > not barf...  Is it expected that larger DB's will not take constant time
> to
> > add a number of records on in a bulk insert?
>
> The next remedy here is to try one of the new uuid formats available in
> 0.10
>
> [uuids]
> algorithm = sequential
>
> in your local ini should do it.
>
> this will help the btree to require less disk seeks to load the path
> to each updated node, as sequential ids are more likely to be in FS
> cache.
>
> I'm assuming your bulk saves are mostly new docs.
>
> Chris
>
>
> >
> > Here is the output of some crude logging.
> >
> > 2009-10-04 06:16:55.854220 sync-mysql-to-couchdb(4241) [INFO]
> > sync-mysql-to-couchdb-daemon.rb:20: Found 1000 records to try and
> migrate.
> >
> > [ I updated each mysql records timestamp individually in this gap.  About
> 9
> > seconds. ]
> >
> > 2009-10-04 06:17:04.876943 sync-mysql-to-couchdb(4241) [INFO]
> > sync-mysql-to-couchdb-daemon.rb:92: Now trying to save 1000 records to
> > CouchDB out of 1000 found in MySQL.
> >
> > [This is the bulk post to CouchDB in this gap.  57 seconds!]
> >
> > 2009-10-04 06:18:01.989428 sync-mysql-to-couchdb(4241) [INFO]
> > sync-mysql-to-couchdb-daemon.rb:94: Bulk save of 1000 of records to
> CouchDB
> > complete.
> >
> > Would you suspect couchdb?  Or possibly IO probs with Elastic Block Store
> > (EBS)?
> >
> > G
> >
> > On Sat, Oct 3, 2009 at 11:07 PM, Glenn Rempe <gl...@rempe.us> wrote:
> >
> >> Thanks for the reply Paul.  Some comments below.
> >>
> >> Also, just for full disclosure, the CouchDB I am working on was moved
> out
> >> of another couchdb and it was originally created using CDB 0.9.1.  I
> show a
> >> dir listing below that indicates exactly what was moved.
> >>
> >> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
> >> paul.joseph.davis@gmail.com> wrote:
> >>
> >>> Glenn,
> >>>
> >>> This sounds like your map function is timing out which causes the
> error.
> >>> You could try upping the os process timeout setting in the config.
> >>>
> >>>
> >> When I go into futon and select one of my views in my design doc it
> >> *always* consistently pops up a javascript alert with the error text at
> ~5
> >> seconds after selecting the view.  It doesn't seem to matter what else I
> do.
> >>  It also didn't vary when I changed the os_process_timeout value in
> futon as
> >> you suggested from 5000 to 25000.  Can you explain exactly what this
> >> particular param is doing?  I assume the value is milliseconds?
> >>
> >>
> >>> To see what's going on you can increase to debug logging or use the log
> >>> function in your maps. There's also the status page in futon which I
> think
> >>> you said you were looking at.
> >>>
> >>>
> >> Yes, I was previously looking at the status page.  But now since I've
> >> upgraded to trunk I never see any indexing activity happening in the
> status
> >> page no matter what I do.
> >>
> >>
> >>> If indexing crashes it should just pick up where it left off when you
> >>> retrigger. Use the status page to verify. If it's not then let us know.
> >>>
> >>>
> >> Can you clarify, is this also the case when no index has ever
> successfully
> >> run?  I was wondering if I first need to get through at least one index
> >> session (maybe with a smaller amount of records) prior to incremental
> >> indexing working as expected.
> >>
> >> Is there any way to determine what percentage of the total records have
> >> been added to the index?
> >>
> >> For your info, here are the contents of the DB dir.  You can see the
> main
> >> DB is 42GB now (~17 million records).
> >>
> >> root@ip-10-250-55-239
> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
> >> ls -la /vol/couchdb/var/lib/couchdb
> >> total 41674956
> >> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
> >> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
> >> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
> >> searchlight_production.couch
> >> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
> >> .searchlight_production_design
> >>
> >> root@ip-10-250-55-239
> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
> >> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
> >> total 33700196
> >> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
> >> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
> >> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
> >> 5f190ffb2ed589746e8796d2423869ac.view
> >> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
> >> b127a58306fb8e7858cd1a92f8398511.view
> >> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
> >> SearchDocument.view
> >>
> >>
> >>
> >>> If you can't find anything in the debug logs then ping the lust and
> we'll
> >>> get into trying to duplicate.
> >>>
> >>>
> >> I have turned on the 'debug' level in the logs and that provided me with
> >> the info I previously provided.  I'll try to use the log function in the
> map
> >> and see if that shows anything.
> >>
> >> Thanks for helping.  If it comes to it, I may be able to make a snapshot
> of
> >> this EBS volume and start a host that you could login to and get your
> hands
> >> directly on it if that would be helpful.
> >>
> >> Glenn
> >>
> >>
> >>
> >
> >
> > --
> > Glenn Rempe
> >
> > email                 : glenn@rempe.us
> > voice                 : (415) 894-5366 or (415)-89G-LENN
> > twitter                : @grempe
> > contact info        : http://www.rempe.us/contact.html
> > pgp                    : http://www.rempe.us/gnupg.txt
> >
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Chris Anderson <jc...@apache.org>.

On Sun, Oct 4, 2009 at 12:31 PM, Paul Davis <pa...@gmail.com> wrote:
> On Sun, Oct 4, 2009 at 1:00 PM, Adam Kocoloski <ko...@apache.org> wrote:
>> On Oct 4, 2009, at 12:26 PM, Glenn Rempe wrote:
>>
>>> BTW.  That was a magic trick Chris.  Thank you.  Bulk saves of 1000 docs
>>> went from almost 1 minute to less than 2 seconds the moment I changed that
>>> value in the futon config interface.
>>> Is there a reason that is not the default setting?
>>
>> I'd say inertia.  The sequential IDs are a newish feature, and the
>> similarity between IDs might be a bit surprising to users.  I'd be in favor
>> of switching the default, though.  Best,
>>
>> Adam
>>
>
> We've had this discussion before and decided that random id's should
> remain default. Though we should probably start writing a wiki page
> about methods for making CouchDB faster so that people have a chance
> to learn about such things.
>
> Paul

I think I'm with Adam about the faster uuids as the default, but we
should take that discussion to dev@

Chris




-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

On Sun, Oct 4, 2009 at 1:00 PM, Adam Kocoloski <ko...@apache.org> wrote:
> On Oct 4, 2009, at 12:26 PM, Glenn Rempe wrote:
>
>> BTW.  That was a magic trick Chris.  Thank you.  Bulk saves of 1000 docs
>> went from almost 1 minute to less than 2 seconds the moment I changed that
>> value in the futon config interface.
>> Is there a reason that is not the default setting?
>
> I'd say inertia.  The sequential IDs are a newish feature, and the
> similarity between IDs might be a bit surprising to users.  I'd be in favor
> of switching the default, though.  Best,
>
> Adam
>

We've had this discussion before and decided that random id's should
remain default. Though we should probably start writing a wiki page
about methods for making CouchDB faster so that people have a chance
to learn about such things.

Paul

Re: Timeout Error when trying to access views + Indexing problems

Posted by Adam Kocoloski <ko...@apache.org>.

On Oct 4, 2009, at 12:26 PM, Glenn Rempe wrote:

> BTW.  That was a magic trick Chris.  Thank you.  Bulk saves of 1000  
> docs
> went from almost 1 minute to less than 2 seconds the moment I  
> changed that
> value in the futon config interface.
> Is there a reason that is not the default setting?

I'd say inertia.  The sequential IDs are a newish feature, and the  
similarity between IDs might be a bit surprising to users.  I'd be in  
favor of switching the default, though.  Best,

Adam

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

BTW.  That was a magic trick Chris.  Thank you.  Bulk saves of 1000 docs
went from almost 1 minute to less than 2 seconds the moment I changed that
value in the futon config interface.
Is there a reason that is not the default setting?

Thanks!

Glenn


> The next remedy here is to try one of the new uuid formats available in
> 0.10
>
> [uuids]
> algorithm = sequential
>
> in your local ini should do it.
>
> this will help the btree to require less disk seeks to load the path
> to each updated node, as sequential ids are more likely to be in FS
> cache.
>
> I'm assuming your bulk saves are mostly new docs.
>
> Chris
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Chris Anderson <jc...@apache.org>.

On Sat, Oct 3, 2009 at 11:27 PM, Glenn Rempe <gl...@rempe.us> wrote:
> And just another data point.  I was seeing timeouts from my CouchRest client
> when trying to do bulk saves to this large DB from my migration script.
> You can see that I find a batch of 1000 records to migrate and bulk save
> them to CouchDB.  I write back a timestamp to MySQL for each of those
> records.  So you can see it takes less than 10 seconds to update 1000 docs
> in mysql, and almost a minute for the http request to return 201 success for
> the bulk insert of that date into couchdb.  Avg record size is a few hundred
> bytes.
>
> I ended up having to fork mattetti CouchRest so that I could set an explicit
> timeout for the embedded RestClient to 120 seconds (!) in order for it to
> not barf...  Is it expected that larger DB's will not take constant time to
> add a number of records on in a bulk insert?

The next remedy here is to try one of the new uuid formats available in 0.10

[uuids]
algorithm = sequential

in your local ini should do it.

this will help the btree to require less disk seeks to load the path
to each updated node, as sequential ids are more likely to be in FS
cache.

I'm assuming your bulk saves are mostly new docs.

Chris


>
> Here is the output of some crude logging.
>
> 2009-10-04 06:16:55.854220 sync-mysql-to-couchdb(4241) [INFO]
> sync-mysql-to-couchdb-daemon.rb:20: Found 1000 records to try and migrate.
>
> [ I updated each mysql records timestamp individually in this gap.  About 9
> seconds. ]
>
> 2009-10-04 06:17:04.876943 sync-mysql-to-couchdb(4241) [INFO]
> sync-mysql-to-couchdb-daemon.rb:92: Now trying to save 1000 records to
> CouchDB out of 1000 found in MySQL.
>
> [This is the bulk post to CouchDB in this gap.  57 seconds!]
>
> 2009-10-04 06:18:01.989428 sync-mysql-to-couchdb(4241) [INFO]
> sync-mysql-to-couchdb-daemon.rb:94: Bulk save of 1000 of records to CouchDB
> complete.
>
> Would you suspect couchdb?  Or possibly IO probs with Elastic Block Store
> (EBS)?
>
> G
>
> On Sat, Oct 3, 2009 at 11:07 PM, Glenn Rempe <gl...@rempe.us> wrote:
>
>> Thanks for the reply Paul.  Some comments below.
>>
>> Also, just for full disclosure, the CouchDB I am working on was moved out
>> of another couchdb and it was originally created using CDB 0.9.1.  I show a
>> dir listing below that indicates exactly what was moved.
>>
>> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
>> paul.joseph.davis@gmail.com> wrote:
>>
>>> Glenn,
>>>
>>> This sounds like your map function is timing out which causes the error.
>>> You could try upping the os process timeout setting in the config.
>>>
>>>
>> When I go into futon and select one of my views in my design doc it
>> *always* consistently pops up a javascript alert with the error text at ~5
>> seconds after selecting the view.  It doesn't seem to matter what else I do.
>>  It also didn't vary when I changed the os_process_timeout value in futon as
>> you suggested from 5000 to 25000.  Can you explain exactly what this
>> particular param is doing?  I assume the value is milliseconds?
>>
>>
>>> To see what's going on you can increase to debug logging or use the log
>>> function in your maps. There's also the status page in futon which I think
>>> you said you were looking at.
>>>
>>>
>> Yes, I was previously looking at the status page.  But now since I've
>> upgraded to trunk I never see any indexing activity happening in the status
>> page no matter what I do.
>>
>>
>>> If indexing crashes it should just pick up where it left off when you
>>> retrigger. Use the status page to verify. If it's not then let us know.
>>>
>>>
>> Can you clarify, is this also the case when no index has ever successfully
>> run?  I was wondering if I first need to get through at least one index
>> session (maybe with a smaller amount of records) prior to incremental
>> indexing working as expected.
>>
>> Is there any way to determine what percentage of the total records have
>> been added to the index?
>>
>> For your info, here are the contents of the DB dir.  You can see the main
>> DB is 42GB now (~17 million records).
>>
>> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb
>> total 41674956
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
>> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
>> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
>> searchlight_production.couch
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
>> .searchlight_production_design
>>
>> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
>> total 33700196
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
>> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
>> 5f190ffb2ed589746e8796d2423869ac.view
>> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
>> b127a58306fb8e7858cd1a92f8398511.view
>> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
>> SearchDocument.view
>>
>>
>>
>>> If you can't find anything in the debug logs then ping the lust and we'll
>>> get into trying to duplicate.
>>>
>>>
>> I have turned on the 'debug' level in the logs and that provided me with
>> the info I previously provided.  I'll try to use the log function in the map
>> and see if that shows anything.
>>
>> Thanks for helping.  If it comes to it, I may be able to make a snapshot of
>> this EBS volume and start a host that you could login to and get your hands
>> directly on it if that would be helpful.
>>
>> Glenn
>>
>>
>>
>
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

And just another data point.  I was seeing timeouts from my CouchRest client
when trying to do bulk saves to this large DB from my migration script.
You can see that I find a batch of 1000 records to migrate and bulk save
them to CouchDB.  I write back a timestamp to MySQL for each of those
records.  So you can see it takes less than 10 seconds to update 1000 docs
in mysql, and almost a minute for the http request to return 201 success for
the bulk insert of that date into couchdb.  Avg record size is a few hundred
bytes.

I ended up having to fork mattetti CouchRest so that I could set an explicit
timeout for the embedded RestClient to 120 seconds (!) in order for it to
not barf...  Is it expected that larger DB's will not take constant time to
add a number of records on in a bulk insert?

Here is the output of some crude logging.

2009-10-04 06:16:55.854220 sync-mysql-to-couchdb(4241) [INFO]
sync-mysql-to-couchdb-daemon.rb:20: Found 1000 records to try and migrate.

[ I updated each mysql records timestamp individually in this gap.  About 9
seconds. ]

2009-10-04 06:17:04.876943 sync-mysql-to-couchdb(4241) [INFO]
sync-mysql-to-couchdb-daemon.rb:92: Now trying to save 1000 records to
CouchDB out of 1000 found in MySQL.

[This is the bulk post to CouchDB in this gap.  57 seconds!]

2009-10-04 06:18:01.989428 sync-mysql-to-couchdb(4241) [INFO]
sync-mysql-to-couchdb-daemon.rb:94: Bulk save of 1000 of records to CouchDB
complete.

Would you suspect couchdb?  Or possibly IO probs with Elastic Block Store
(EBS)?

G

On Sat, Oct 3, 2009 at 11:07 PM, Glenn Rempe <gl...@rempe.us> wrote:

> Thanks for the reply Paul.  Some comments below.
>
> Also, just for full disclosure, the CouchDB I am working on was moved out
> of another couchdb and it was originally created using CDB 0.9.1.  I show a
> dir listing below that indicates exactly what was moved.
>
> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
> paul.joseph.davis@gmail.com> wrote:
>
>> Glenn,
>>
>> This sounds like your map function is timing out which causes the error.
>> You could try upping the os process timeout setting in the config.
>>
>>
> When I go into futon and select one of my views in my design doc it
> *always* consistently pops up a javascript alert with the error text at ~5
> seconds after selecting the view.  It doesn't seem to matter what else I do.
>  It also didn't vary when I changed the os_process_timeout value in futon as
> you suggested from 5000 to 25000.  Can you explain exactly what this
> particular param is doing?  I assume the value is milliseconds?
>
>
>> To see what's going on you can increase to debug logging or use the log
>> function in your maps. There's also the status page in futon which I think
>> you said you were looking at.
>>
>>
> Yes, I was previously looking at the status page.  But now since I've
> upgraded to trunk I never see any indexing activity happening in the status
> page no matter what I do.
>
>
>> If indexing crashes it should just pick up where it left off when you
>> retrigger. Use the status page to verify. If it's not then let us know.
>>
>>
> Can you clarify, is this also the case when no index has ever successfully
> run?  I was wondering if I first need to get through at least one index
> session (maybe with a smaller amount of records) prior to incremental
> indexing working as expected.
>
> Is there any way to determine what percentage of the total records have
> been added to the index?
>
> For your info, here are the contents of the DB dir.  You can see the main
> DB is 42GB now (~17 million records).
>
> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
> ls -la /vol/couchdb/var/lib/couchdb
> total 41674956
> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
> searchlight_production.couch
> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
> .searchlight_production_design
>
> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
> total 33700196
> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
> 5f190ffb2ed589746e8796d2423869ac.view
> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
> b127a58306fb8e7858cd1a92f8398511.view
> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
> SearchDocument.view
>
>
>
>> If you can't find anything in the debug logs then ping the lust and we'll
>> get into trying to duplicate.
>>
>>
> I have turned on the 'debug' level in the logs and that provided me with
> the info I previously provided.  I'll try to use the log function in the map
> and see if that shows anything.
>
> Thanks for helping.  If it comes to it, I may be able to make a snapshot of
> this EBS volume and start a host that you could login to and get your hands
> directly on it if that would be helpful.
>
> Glenn
>
>
>


-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Thanks Adam.  I will give your suggestion a try later today.  Currently I
have an indexing in progress:
Processed 2149149 of 17774865 changes (12%)

So I want to let that try to run to completion (and I'm not going to touch
it until it does!).  The indexing kicked off when I requested a rails page
(as opposed to trying to access the custom view in Futon) which called that
view through CouchRest and its been running ok for a couple of hours now.
 I'll report back when/if it completes, and I will try your test as well to
see what it reports.

Thanks,

Glenn

On Sun, Oct 4, 2009 at 10:19 AM, Adam Kocoloski <ko...@apache.org> wrote:

> Hi Glenn, I saw something like this once, but unfortunately I wasn't able
> to resolve it.  Can you try the following?
>
> 1) Start couchdb with the -i option to get an interactive shell
>
> 2) At the prompt, try to open the view file and read the header
>
> {ok, Fd} =
> couch_file:open("/vol/couchdb/var/lib/couchdb/.searchlight_production_design/5f190ffb2ed589746e8796d2423869ac.view").
> couch_file:read_header(Fd).
>
> In my case, that call timed out, and the stacktrace during normal operation
> was exactly the one you showed earlier in this thread.  Best,
>
> Adam
>
>
> On Oct 4, 2009, at 2:07 AM, Glenn Rempe wrote:
>
>  Thanks for the reply Paul.  Some comments below.
>>
>> Also, just for full disclosure, the CouchDB I am working on was moved out
>> of
>> another couchdb and it was originally created using CDB 0.9.1.  I show a
>> dir
>> listing below that indicates exactly what was moved.
>>
>> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
>> paul.joseph.davis@gmail.com> wrote:
>>
>>  Glenn,
>>>
>>> This sounds like your map function is timing out which causes the error.
>>> You could try upping the os process timeout setting in the config.
>>>
>>>
>>>  When I go into futon and select one of my views in my design doc it
>> *always*
>> consistently pops up a javascript alert with the error text at ~5 seconds
>> after selecting the view.  It doesn't seem to matter what else I do.  It
>> also didn't vary when I changed the os_process_timeout value in futon as
>> you
>> suggested from 5000 to 25000.  Can you explain exactly what this
>> particular
>> param is doing?  I assume the value is milliseconds?
>>
>>
>>  To see what's going on you can increase to debug logging or use the log
>>> function in your maps. There's also the status page in futon which I
>>> think
>>> you said you were looking at.
>>>
>>>
>>>  Yes, I was previously looking at the status page.  But now since I've
>> upgraded to trunk I never see any indexing activity happening in the
>> status
>> page no matter what I do.
>>
>>
>>  If indexing crashes it should just pick up where it left off when you
>>> retrigger. Use the status page to verify. If it's not then let us know.
>>>
>>>
>>>  Can you clarify, is this also the case when no index has ever
>> successfully
>> run?  I was wondering if I first need to get through at least one index
>> session (maybe with a smaller amount of records) prior to incremental
>> indexing working as expected.
>>
>> Is there any way to determine what percentage of the total records have
>> been
>> added to the index?
>>
>> For your info, here are the contents of the DB dir.  You can see the main
>> DB
>> is 42GB now (~17 million records).
>>
>> root@ip-10-250-55-239
>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb
>> total 41674956
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
>> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
>> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
>> searchlight_production.couch
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
>> .searchlight_production_design
>>
>> root@ip-10-250-55-239
>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
>> total 33700196
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
>> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
>> 5f190ffb2ed589746e8796d2423869ac.view
>> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
>> b127a58306fb8e7858cd1a92f8398511.view
>> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
>> SearchDocument.view
>>
>>
>>
>>  If you can't find anything in the debug logs then ping the lust and we'll
>>> get into trying to duplicate.
>>>
>>>
>>>  I have turned on the 'debug' level in the logs and that provided me with
>> the
>> info I previously provided.  I'll try to use the log function in the map
>> and
>> see if that shows anything.
>>
>> Thanks for helping.  If it comes to it, I may be able to make a snapshot
>> of
>> this EBS volume and start a host that you could login to and get your
>> hands
>> directly on it if that would be helpful.
>>
>> Glenn
>>
>
>


-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

Its 6:15am, so I'll keep this short,

The monit scripts I saw proposed in the other thread worried me a bit
but I don't know much about monit. You generally don't want to use
monit on the couchdb PID, but the PID of heart. CouchdDB (and Erlang)
are designed to restart as part of their normal (recovery) procedure.
Which normally freaks monit out from my experience. But maybe I was
Doing It Wrong.

[Tue, 06 Oct 2009 06:24:09 GMT] [error] [<0.20.0>] {error_report,<0.7.0>,
              {<0.20.0>,std_error,
               "File operation error: eacces. Target:
./couch_doc.beam. Function: get_file. Process: code_server."}}

^^^ We've seen that error before.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200908.mbox/%3cBF2D758D-A311-4592-8BE1-CA8A35A976F1@me.com%3e

Sadly I don't see a resolution. Sounds similar though.

Also, still confuses the shit out of me. Anyone remember if that ever
got fixed? Niket?

Paul J Davis

On Tue, Oct 6, 2009 at 2:46 AM, Glenn Rempe <gl...@rempe.us> wrote:
> Hi Adam (and group!).  So I am still struggling, and need help or advice.
>  It is much appreciated.
> Adam, I tried your suggestion to attempt to open the index file in
> interactive mode.  I was able to start CouchDB in interactive mode, and the
> 'couch_file:open' command return ok instantly, but the
> 'couch_file:read_header(Fd).' provided no output for about 15 min and I
> eventually aborted it.  Should it ever take that long?  What output should I
> expect to see?
>
> Please note that to try and make sure there was nothing wrong with the index
> files, I actually destroyed them and tried a rebuild from scratch.  This
> seemed to be going ok, and the indexing process ran all night.  I got up to
> roughly 20% of the 22 million records in the DB indexed when it mysteriously
> died yet again.  I am noticing that it is at this point that I am always
> seeing this process fail.
>
> When the indexing fails it provides NO information in the couchdb log (with
> log level set to 'debug').  I backtracked in the couchdb log file to find
> where it stopped working, looking for a stack trace.  All I found was the
> point at which it just stopped reporting progress and the couchjs process
> was dead.  Here is that point in the log file:
>
> http://pastie.org/642736
>
> And this is the last line reporting indexing status (line 25 in the pastie):
>
> [Mon, 05 Oct 2009 18:52:42 GMT] [debug] [<0.80.0>] New task status for
> searchlight_production _design/SearchDocument: Processed 4377984 of 22650375
> changes (19%)
>
> After that, just silence in the log.
>
> Here is one hint which suggests that perhaps it is not just the couchjs
> process that is crashing, but perhaps the entire couchdb process?  In the
> /var/log/syslog I see the following three minutes after the last entry
> showing index status in the couch log:
>
> http://pastie.org/642952
>
> This is being logged by monit, which I am using to keep an eye on couchdb.
>  I may turn off monit, and see if the entire couchdb process crashes when
> the indexing stops.
>
> UPDATE :
> Even with monit turned off, it looks like the couchdb pids have changed
> during or after the indexing process failure.  This is the 'ps -ef' output
> after the crash, to compare with the output in the previous link which was
> taken when the indexing was started:
>
> http://pastie.org/643335
> END UPDATE:
>
> If I try and access that view in a rails console session after this failure
> I am getting a 500 error:
>
>>> counts = SearchDocument.by_engine(:reduce => true, :group => true, :stale
> => 'ok')['rows']
> RestClient::RequestFailed: HTTP status code 500
>
> And I see the following stack trace in the couch log:
>
> [Mon, 05 Oct 2009 21:53:00 GMT] [info] [<0.301.0>] Stacktrace:
> [{gen_server,call,2},
>             {couch_view,get_group_server,2},
>             {couch_view,get_group,3},
>             {couch_view,get_map_view,4},
>             {couch_httpd_view,design_doc_view,5},
>             {couch_httpd_db,do_db_req,2},
>             {couch_httpd,handle_request,5},
>             {mochiweb_http,headers,5}]
>
> Here is the full output:
>
> http://pastie.org/642940
>
> Is it possible that either:
>
> - there is some kind of corruption in the main DB file and when this point
> is reached (or a specific record in the DB?) that it crashes? If so how can
> I best identify this?
> - There is some file system related issue?  I tend to discount this option
> just because it always seems to fail at roughly the same point in the
> process.
>
> What are the recommended next steps for finding the root cause of the issue?
>  Should I insert a log() statement in the view code and try to run the whole
> index with that in place?
>
> UPDATE : I tried to run the index again with monit turned off, since I saw
> some messages in the syslog about the couchdb process.  I was hoping that
> perhaps monit was inadvertantly killing and restarting couchdb which was
> causing the index failure.  Alas this was not the case.  It just died on me
> again after 3.6 million records.  And there is some different error messages
> in the log immediately after it stopped reporting on indexing progress.
>  Please take a look at this pastie to see the logs:
>
> http://pastie.org/643330
>
> (There's a blog of error messages staring at line 25)
>
> I am going to try tonight splitting the 7 views I currently have in a single
> design doc in to 4 design docs according to the part of the app they pertain
> to.  I am hoping that this will narrow it down if some design docs finish
> indexing and others don't.
>
> Thanks again for the help (and reading this long message!).
>
> Glenn
>
>
> On Sun, Oct 4, 2009 at 10:19 AM, Adam Kocoloski <ko...@apache.org> wrote:
>
>> Hi Glenn, I saw something like this once, but unfortunately I wasn't able
>> to resolve it.  Can you try the following?
>>
>> 1) Start couchdb with the -i option to get an interactive shell
>>
>> 2) At the prompt, try to open the view file and read the header
>>
>> {ok, Fd} =
>> couch_file:open("/vol/couchdb/var/lib/couchdb/.searchlight_production_design/5f190ffb2ed589746e8796d2423869ac.view").
>> couch_file:read_header(Fd).
>>
>> In my case, that call timed out, and the stacktrace during normal operation
>> was exactly the one you showed earlier in this thread.  Best,
>>
>> Adam
>>
>>
>> On Oct 4, 2009, at 2:07 AM, Glenn Rempe wrote:
>>
>>  Thanks for the reply Paul.  Some comments below.
>>>
>>> Also, just for full disclosure, the CouchDB I am working on was moved out
>>> of
>>> another couchdb and it was originally created using CDB 0.9.1.  I show a
>>> dir
>>> listing below that indicates exactly what was moved.
>>>
>>> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
>>> paul.joseph.davis@gmail.com> wrote:
>>>
>>>  Glenn,
>>>>
>>>> This sounds like your map function is timing out which causes the error.
>>>> You could try upping the os process timeout setting in the config.
>>>>
>>>>
>>>>  When I go into futon and select one of my views in my design doc it
>>> *always*
>>> consistently pops up a javascript alert with the error text at ~5 seconds
>>> after selecting the view.  It doesn't seem to matter what else I do.  It
>>> also didn't vary when I changed the os_process_timeout value in futon as
>>> you
>>> suggested from 5000 to 25000.  Can you explain exactly what this
>>> particular
>>> param is doing?  I assume the value is milliseconds?
>>>
>>>
>>>  To see what's going on you can increase to debug logging or use the log
>>>> function in your maps. There's also the status page in futon which I
>>>> think
>>>> you said you were looking at.
>>>>
>>>>
>>>>  Yes, I was previously looking at the status page.  But now since I've
>>> upgraded to trunk I never see any indexing activity happening in the
>>> status
>>> page no matter what I do.
>>>
>>>
>>>  If indexing crashes it should just pick up where it left off when you
>>>> retrigger. Use the status page to verify. If it's not then let us know.
>>>>
>>>>
>>>>  Can you clarify, is this also the case when no index has ever
>>> successfully
>>> run?  I was wondering if I first need to get through at least one index
>>> session (maybe with a smaller amount of records) prior to incremental
>>> indexing working as expected.
>>>
>>> Is there any way to determine what percentage of the total records have
>>> been
>>> added to the index?
>>>
>>> For your info, here are the contents of the DB dir.  You can see the main
>>> DB
>>> is 42GB now (~17 million records).
>>>
>>> root@ip-10-250-55-239
>>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>>> ls -la /vol/couchdb/var/lib/couchdb
>>> total 41674956
>>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
>>> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
>>> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
>>> searchlight_production.couch
>>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
>>> .searchlight_production_design
>>>
>>> root@ip-10-250-55-239
>>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>>> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
>>> total 33700196
>>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
>>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
>>> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
>>> 5f190ffb2ed589746e8796d2423869ac.view
>>> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
>>> b127a58306fb8e7858cd1a92f8398511.view
>>> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
>>> SearchDocument.view
>>>
>>>
>>>
>>>  If you can't find anything in the debug logs then ping the lust and we'll
>>>> get into trying to duplicate.
>>>>
>>>>
>>>>  I have turned on the 'debug' level in the logs and that provided me with
>>> the
>>> info I previously provided.  I'll try to use the log function in the map
>>> and
>>> see if that shows anything.
>>>
>>> Thanks for helping.  If it comes to it, I may be able to make a snapshot
>>> of
>>> this EBS volume and start a host that you could login to and get your
>>> hands
>>> directly on it if that would be helpful.
>>>
>>> Glenn
>>>
>>
>>
>
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Robert Dionne <di...@dionne-associates.com>.

have you tried running with +d to get a crash dump? It might provide  
some clues.



On Oct 6, 2009, at 3:39 PM, Paul Davis wrote:

> Ignore it?
>
> I'd focus on writing a script that checks memory usage while there's
> an indexer running.
>
> Paul
>
> On Tue, Oct 6, 2009 at 3:35 PM, Glenn Rempe <gl...@rempe.us> wrote:
>> Running compaction but it is *slow*.  Running at a pace of about
>> ~1,000 records every 30 seconds according the the log.
>>
>> At that pace it I think it will take 233 hours (!) to compact.  Is
>> there anything I can tweak to get that running faster?  I can't wait
>> 10 days... :-(
>>
>> G
>>
>> On Tue, Oct 6, 2009 at 11:28 AM, Paul Davis <paul.joseph.davis@gmail.com 
>> > wrote:
>>>>> Also, I just went through and re-read the entire discussion. After
>>>>> your 0.9.1 -> trunk upgrade did you compact the database? I can't
>>>>> think of anything that'd cause an issue there but it might be
>>>>> something to try (there is a conversion process during  
>>>>> compaction).
>>>>>
>>>>>
>>>> I did not do a compaction.  I can try that.  Unfortunately that  
>>>> probably
>>>> kills another day compacting my 50GB 28mm record DB.  ;-)  But,  
>>>> hey, if it
>>>> helps... :-)
>>>>
>>>
>>> Its a possibility is all. Theoretically this is more incremental, so
>>> even if you kick it off and it dies it'll restart part way through
>>> even without a complete run. (Very theoretically as I haven't  
>>> tried it
>>> yet). Also it'll run just fine in the background.
>>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

Ignore it?

I'd focus on writing a script that checks memory usage while there's
an indexer running.

Paul

On Tue, Oct 6, 2009 at 3:35 PM, Glenn Rempe <gl...@rempe.us> wrote:
> Running compaction but it is *slow*.  Running at a pace of about
> ~1,000 records every 30 seconds according the the log.
>
> At that pace it I think it will take 233 hours (!) to compact.  Is
> there anything I can tweak to get that running faster?  I can't wait
> 10 days... :-(
>
> G
>
> On Tue, Oct 6, 2009 at 11:28 AM, Paul Davis <pa...@gmail.com> wrote:
>>>> Also, I just went through and re-read the entire discussion. After
>>>> your 0.9.1 -> trunk upgrade did you compact the database? I can't
>>>> think of anything that'd cause an issue there but it might be
>>>> something to try (there is a conversion process during compaction).
>>>>
>>>>
>>> I did not do a compaction.  I can try that.  Unfortunately that probably
>>> kills another day compacting my 50GB 28mm record DB.  ;-)  But, hey, if it
>>> helps... :-)
>>>
>>
>> Its a possibility is all. Theoretically this is more incremental, so
>> even if you kick it off and it dies it'll restart part way through
>> even without a complete run. (Very theoretically as I haven't tried it
>> yet). Also it'll run just fine in the background.
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Running compaction but it is *slow*.  Running at a pace of about
~1,000 records every 30 seconds according the the log.

At that pace it I think it will take 233 hours (!) to compact.  Is
there anything I can tweak to get that running faster?  I can't wait
10 days... :-(

G

On Tue, Oct 6, 2009 at 11:28 AM, Paul Davis <pa...@gmail.com> wrote:
>>> Also, I just went through and re-read the entire discussion. After
>>> your 0.9.1 -> trunk upgrade did you compact the database? I can't
>>> think of anything that'd cause an issue there but it might be
>>> something to try (there is a conversion process during compaction).
>>>
>>>
>> I did not do a compaction.  I can try that.  Unfortunately that probably
>> kills another day compacting my 50GB 28mm record DB.  ;-)  But, hey, if it
>> helps... :-)
>>
>
> Its a possibility is all. Theoretically this is more incremental, so
> even if you kick it off and it dies it'll restart part way through
> even without a complete run. (Very theoretically as I haven't tried it
> yet). Also it'll run just fine in the background.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Adam Kocoloski <ko...@apache.org>.

On Oct 6, 2009, at 3:13 PM, Glenn Rempe wrote:

> On Tue, Oct 6, 2009 at 11:28 AM, Paul Davis <paul.joseph.davis@gmail.com 
> > wrote:
>>> So there is no way to turn on an additional level of debugging in  
>>> the view
>>> generation process with the current code?  I noticed that there is  
>>> a 'tmi'
>>> logging level in the erlang couchdb code (which I just turned  
>>> on).  Will
>>> this help?
>>
>> A TMI log level is news to me. I've never seen a log macro that  
>> uses it.
>
> LEVEL_TMI is defined in couch_log.erl (I assume it stands for 'Too
> Much Information' :-)
>
> But it looks like its not really used in the code...
>
> glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_DEBUG |  
> wc -l
>      56
> glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_TMI | wc -l
>       0
> glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_INFO | wc  
> -l
>      41
> glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_ERROR |  
> wc -l
>      30
>
> Actually, if I am reading this correctly there are only 127 logging
> statements in the entirety of the couchdb erlang code.  That doesn't
> sound like much...  (coming from a guy starved for logging info :-)
>
> Glenn

Heh, VM death is about the only time Erlang is quiet.  Any other error  
generates massive amounts of logging.  Many of the LOG_ERROR messages  
actually catch a larger error and reformat it into something shorter!   
I know this bit of information doesn't help you at all :-/

Adam

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

On Tue, Oct 6, 2009 at 11:28 AM, Paul Davis <pa...@gmail.com> wrote:
> > So there is no way to turn on an additional level of debugging in the view
> > generation process with the current code?  I noticed that there is a 'tmi'
> > logging level in the erlang couchdb code (which I just turned on).  Will
> > this help?
>
> A TMI log level is news to me. I've never seen a log macro that uses it.

LEVEL_TMI is defined in couch_log.erl (I assume it stands for 'Too
Much Information' :-)

But it looks like its not really used in the code...

glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_DEBUG | wc -l
      56
glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_TMI | wc -l
       0
glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_INFO | wc -l
      41
glenn@macbook-pro ~/src/git/couchdb[master*]$ git grep LOG_ERROR | wc -l
      30

Actually, if I am reading this correctly there are only 127 logging
statements in the entirety of the couchdb erlang code.  That doesn't
sound like much...  (coming from a guy starved for logging info :-)

Glenn

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

On Tue, Oct 6, 2009 at 2:21 PM, Glenn Rempe <gl...@rempe.us> wrote:
> Thanks Paul.  Comments below.
>
> On Tue, Oct 6, 2009 at 11:01 AM, Paul Davis <pa...@gmail.com>wrote:
>
>>
>> Glenn,
>>
>> The quickest way to check if you have a bad document in your DB would
>> probably be something like:
>>
>> $ ps ax | grep beam.smp
>> $ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true >
>> /dev/null
>> $ ps ax | grep beam.smp
>>
>> You only need to trigger the doc to exit through the JSON serializer
>> to trigger the badness.
>>
>>
> I am running this now.
>
>
>> If its being restarted by heart, then its most likely a complete VM
>> death. The fact the PID is changing suggests that you're hitting VM
>> death. And on complete VM death there is nothing CouchDB can do to
>> help. VM deaths are instant and dramatic. Have you tried checking
>> memory allocated the beam.smp process as it gets further along? A
>> common cause of instant VM deaths is when malloc returns NULL.
>>
>>
> I have kept an eye on the overall system memory usage.  The EC2 XLarge
> instance I am running on has 15GB RAM, and I have never seen the RAM usage
> go over 4-5GB since I switched to XLarge.  Is there a specific command you
> suggest for tracking memory explicitly assigned to the beam?
>

I'm not very high tech here. Top and free generally just to get an
idea. Memory reporting is kinda wonky so I generally only check for
order of magnitude type checking. Though the next time you start an
indexing run a small script that spins and records high water mark
memory allocation to that PID could prove useful if it's a major spike
that causes VM death.

>
>
>> Also, I just went through and re-read the entire discussion. After
>> your 0.9.1 -> trunk upgrade did you compact the database? I can't
>> think of anything that'd cause an issue there but it might be
>> something to try (there is a conversion process during compaction).
>>
>>
> I did not do a compaction.  I can try that.  Unfortunately that probably
> kills another day compacting my 50GB 28mm record DB.  ;-)  But, hey, if it
> helps... :-)
>

Its a possibility is all. Theoretically this is more incremental, so
even if you kick it off and it dies it'll restart part way through
even without a complete run. (Very theoretically as I haven't tried it
yet). Also it'll run just fine in the background.

>
>> If the db dump and compaction don't show anything then we'll take a
>> look at writing some scripts to go through and check docs and add some
>> reporting to the view generation process to try and get a handle on
>> what's going on.
>>
>> Paul Davis
>>
>
> So there is no way to turn on an additional level of debugging in the view
> generation process with the current code?  I noticed that there is a 'tmi'
> logging level in the erlang couchdb code (which I just turned on).  Will
> this help?

A TMI log level is news to me. I've never seen a log macro that uses it.

> Again, thanks.  I know this is my problem, but knowing that there are some
> people willing to lend a hand, and maybe write some code to help identify /
> resolve this is whats keeping me going.  :-)  Much appreciated.  And
> hopefully couchdb will be the better for it in the end.
>
> Glenn
>

Don't worry. I quite dislike not figuring out the cause of anything
that sounds even remotely like a bug in CouchDB.

Paul Davis

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Thanks Paul.  Comments below.

On Tue, Oct 6, 2009 at 11:01 AM, Paul Davis <pa...@gmail.com>wrote:

>
> Glenn,
>
> The quickest way to check if you have a bad document in your DB would
> probably be something like:
>
> $ ps ax | grep beam.smp
> $ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true >
> /dev/null
> $ ps ax | grep beam.smp
>
> You only need to trigger the doc to exit through the JSON serializer
> to trigger the badness.
>
>
I am running this now.


> If its being restarted by heart, then its most likely a complete VM
> death. The fact the PID is changing suggests that you're hitting VM
> death. And on complete VM death there is nothing CouchDB can do to
> help. VM deaths are instant and dramatic. Have you tried checking
> memory allocated the beam.smp process as it gets further along? A
> common cause of instant VM deaths is when malloc returns NULL.
>
>
I have kept an eye on the overall system memory usage.  The EC2 XLarge
instance I am running on has 15GB RAM, and I have never seen the RAM usage
go over 4-5GB since I switched to XLarge.  Is there a specific command you
suggest for tracking memory explicitly assigned to the beam?



> Also, I just went through and re-read the entire discussion. After
> your 0.9.1 -> trunk upgrade did you compact the database? I can't
> think of anything that'd cause an issue there but it might be
> something to try (there is a conversion process during compaction).
>
>
I did not do a compaction.  I can try that.  Unfortunately that probably
kills another day compacting my 50GB 28mm record DB.  ;-)  But, hey, if it
helps... :-)


> If the db dump and compaction don't show anything then we'll take a
> look at writing some scripts to go through and check docs and add some
> reporting to the view generation process to try and get a handle on
> what's going on.
>
> Paul Davis
>

So there is no way to turn on an additional level of debugging in the view
generation process with the current code?  I noticed that there is a 'tmi'
logging level in the erlang couchdb code (which I just turned on).  Will
this help?
Again, thanks.  I know this is my problem, but knowing that there are some
people willing to lend a hand, and maybe write some code to help identify /
resolve this is whats keeping me going.  :-)  Much appreciated.  And
hopefully couchdb will be the better for it in the end.

Glenn

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

On Tue, Oct 6, 2009 at 12:28 PM, Glenn Rempe <gl...@rempe.us> wrote:
> I spoke too soon.  The indexing just died after running all night and
> getting up to ~6mm records indexed.  :-(
> It *almost* feels like there's a correlation between my accessing the system
> through futon to check on status, and its failing silently...
>
> Look at this log from this AM.  Its been running all night, and then all of
> a sudden 'CouchDB has restarted' message when I am accessing futon.  Is it
> possible that something related to futon is slowed or died forcing a restart
> of *everything* and killing my indexing process? See:
>
> http://pastie.org/643915
>
> There has *got* to be a way to bump up the logging level on all of these
> processes.  Is that single line in the log about restarting CouchDB really
> all I get?  With no indication at ALL of WHY its restarted (and apparently
> killed my index processing in the process, or is that the indexing process
> has died forcing a restart?  The logs just don't seem to want to give that
> info up.) ?  Am I missing something?  I NEED logging data to find out what
> the hell is going on here.  The silent death treatment is driving me nuts.
>  Sorry, I am frustrated and this indexing is literally the last step to
> bringing a production system on line for its tests.  If I can't get these
> indexes built, couchdb will have been a complete failure for me after weeks
> of dev to convert a system to use it.
>
> <help?>
>
> Glenn
>
> On Tue, Oct 6, 2009 at 8:53 AM, Glenn Rempe <gl...@rempe.us> wrote:
>
>> Would replicating the DB to the same host perform those checks?  Also, if I
>> setup the auto-index every X # of records script shown on the wiki would
>> that be run on indexing?  These two combined might allow me to essentially
>> scan and check the records as migrated, and build indexes incrementally from
>> the get go.
>> Is there another way to run a scan for 'invalid' records across the whole
>> db?  Could I write a script to loop through all records?  And if I did what
>> would I be looking for?  That the JSON parses?  What else?
>>
>> Also, a new data point.  Last night before going to bed I split up my 7
>> views which were all in one design doc into 4 docs (1 x 1view, 3 x 2views).
>>  I started indexing one of them last night and this morning its still
>> running.  Its a simple view with a map and reduce:
>>
>> function(doc) {
>>   if( (doc['couchrest-type'] == 'SearchDocument') && doc.engine) {
>>     emit(doc.engine, 1);
>>   }
>> }
>>
>> function(keys, values, rereduce) {
>>   return sum(values);
>> }
>>
>> Processed 6570762 of 28249510 changes (23%)
>>
>> 6 million records is higher than I gotten on previous attempts which seemed
>> to bork at around ~4mm.
>>
>> Strange.
>>
>> G
>>
>> On Tue, Oct 6, 2009 at 6:29 AM, Curt Arnold <ca...@apache.org> wrote:
>>
>>>
>>> On Oct 6, 2009, at 1:46 AM, Glenn Rempe wrote:
>>>
>>>>
>>>> - there is some kind of corruption in the main DB file and when this
>>>> point
>>>> is reached (or a specific record in the DB?) that it crashes? If so how
>>>> can
>>>> I best identify this?
>>>>
>>>
>>> Inserting mal-encoded documents into CouchDB could interfere with document
>>> retrieval and indexing, see
>>> https://issues.apache.org/jira/browse/COUCHDB-345.  Possibly one of those
>>> got into your database and now it is stopping the rebuilding of views.  A
>>> patch recently got added to prevent mal-encoded documents from being
>>> accepted, but it does not fix the problem on an existing database that has
>>> been corrupted.   I do not know if the symptoms are the same as what you are
>>> observing, but I think it would be a likely culprit.
>>>
>>
>>
>>
>> --
>> Glenn Rempe
>>
>> email                 : glenn@rempe.us
>> voice                 : (415) 894-5366 or (415)-89G-LENN
>> twitter                : @grempe
>> contact info        : http://www.rempe.us/contact.html
>> pgp                    : http://www.rempe.us/gnupg.txt
>>
>>
>
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>

Glenn,

The quickest way to check if you have a bad document in your DB would
probably be something like:

$ ps ax | grep beam.smp
$ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true > /dev/null
$ ps ax | grep beam.smp

You only need to trigger the doc to exit through the JSON serializer
to trigger the badness.

If its being restarted by heart, then its most likely a complete VM
death. The fact the PID is changing suggests that you're hitting VM
death. And on complete VM death there is nothing CouchDB can do to
help. VM deaths are instant and dramatic. Have you tried checking
memory allocated the beam.smp process as it gets further along? A
common cause of instant VM deaths is when malloc returns NULL.

Also, I just went through and re-read the entire discussion. After
your 0.9.1 -> trunk upgrade did you compact the database? I can't
think of anything that'd cause an issue there but it might be
something to try (there is a conversion process during compaction).

If the db dump and compaction don't show anything then we'll take a
look at writing some scripts to go through and check docs and add some
reporting to the view generation process to try and get a handle on
what's going on.

Paul Davis

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

I spoke too soon.  The indexing just died after running all night and
getting up to ~6mm records indexed.  :-(
It *almost* feels like there's a correlation between my accessing the system
through futon to check on status, and its failing silently...

Look at this log from this AM.  Its been running all night, and then all of
a sudden 'CouchDB has restarted' message when I am accessing futon.  Is it
possible that something related to futon is slowed or died forcing a restart
of *everything* and killing my indexing process? See:

http://pastie.org/643915

There has *got* to be a way to bump up the logging level on all of these
processes.  Is that single line in the log about restarting CouchDB really
all I get?  With no indication at ALL of WHY its restarted (and apparently
killed my index processing in the process, or is that the indexing process
has died forcing a restart?  The logs just don't seem to want to give that
info up.) ?  Am I missing something?  I NEED logging data to find out what
the hell is going on here.  The silent death treatment is driving me nuts.
 Sorry, I am frustrated and this indexing is literally the last step to
bringing a production system on line for its tests.  If I can't get these
indexes built, couchdb will have been a complete failure for me after weeks
of dev to convert a system to use it.

<help?>

Glenn

On Tue, Oct 6, 2009 at 8:53 AM, Glenn Rempe <gl...@rempe.us> wrote:

> Would replicating the DB to the same host perform those checks?  Also, if I
> setup the auto-index every X # of records script shown on the wiki would
> that be run on indexing?  These two combined might allow me to essentially
> scan and check the records as migrated, and build indexes incrementally from
> the get go.
> Is there another way to run a scan for 'invalid' records across the whole
> db?  Could I write a script to loop through all records?  And if I did what
> would I be looking for?  That the JSON parses?  What else?
>
> Also, a new data point.  Last night before going to bed I split up my 7
> views which were all in one design doc into 4 docs (1 x 1view, 3 x 2views).
>  I started indexing one of them last night and this morning its still
> running.  Its a simple view with a map and reduce:
>
> function(doc) {
>   if( (doc['couchrest-type'] == 'SearchDocument') && doc.engine) {
>     emit(doc.engine, 1);
>   }
> }
>
> function(keys, values, rereduce) {
>   return sum(values);
> }
>
> Processed 6570762 of 28249510 changes (23%)
>
> 6 million records is higher than I gotten on previous attempts which seemed
> to bork at around ~4mm.
>
> Strange.
>
> G
>
> On Tue, Oct 6, 2009 at 6:29 AM, Curt Arnold <ca...@apache.org> wrote:
>
>>
>> On Oct 6, 2009, at 1:46 AM, Glenn Rempe wrote:
>>
>>>
>>> - there is some kind of corruption in the main DB file and when this
>>> point
>>> is reached (or a specific record in the DB?) that it crashes? If so how
>>> can
>>> I best identify this?
>>>
>>
>> Inserting mal-encoded documents into CouchDB could interfere with document
>> retrieval and indexing, see
>> https://issues.apache.org/jira/browse/COUCHDB-345.  Possibly one of those
>> got into your database and now it is stopping the rebuilding of views.  A
>> patch recently got added to prevent mal-encoded documents from being
>> accepted, but it does not fix the problem on an existing database that has
>> been corrupted.   I do not know if the symptoms are the same as what you are
>> observing, but I think it would be a likely culprit.
>>
>
>
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>
>

-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Sorry, I think one of my questions was not clear.  I meant:  If I
'replicated' the DB from its current location to a new DB name on the same
host would that do the scan for malformed records (which wasn't run on the
beginnning of this DB since it was started on 0.9.1) and would indexing
happen incrementally during 'replication' if I setup that ruby index script
so I can avoid these huge indexing events which take a god-awful long time
before failing?

On Tue, Oct 6, 2009 at 8:53 AM, Glenn Rempe <gl...@rempe.us> wrote:

> Would replicating the DB to the same host perform those checks?  Also, if I
> setup the auto-index every X # of records script shown on the wiki would
> that be run on indexing?  These two combined might allow me to essentially
> scan and check the records as migrated, and build indexes incrementally from
> the get go.
> Is there another way to run a scan for 'invalid' records across the whole
> db?  Could I write a script to loop through all records?  And if I did what
> would I be looking for?  That the JSON parses?  What else?
>
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Would replicating the DB to the same host perform those checks?  Also, if I
setup the auto-index every X # of records script shown on the wiki would
that be run on indexing?  These two combined might allow me to essentially
scan and check the records as migrated, and build indexes incrementally from
the get go.
Is there another way to run a scan for 'invalid' records across the whole
db?  Could I write a script to loop through all records?  And if I did what
would I be looking for?  That the JSON parses?  What else?

Also, a new data point.  Last night before going to bed I split up my 7
views which were all in one design doc into 4 docs (1 x 1view, 3 x 2views).
 I started indexing one of them last night and this morning its still
running.  Its a simple view with a map and reduce:

function(doc) {
  if( (doc['couchrest-type'] == 'SearchDocument') && doc.engine) {
    emit(doc.engine, 1);
  }
}

function(keys, values, rereduce) {
  return sum(values);
}

Processed 6570762 of 28249510 changes (23%)

6 million records is higher than I gotten on previous attempts which seemed
to bork at around ~4mm.

Strange.

G

On Tue, Oct 6, 2009 at 6:29 AM, Curt Arnold <ca...@apache.org> wrote:

>
> On Oct 6, 2009, at 1:46 AM, Glenn Rempe wrote:
>
>>
>> - there is some kind of corruption in the main DB file and when this point
>> is reached (or a specific record in the DB?) that it crashes? If so how
>> can
>> I best identify this?
>>
>
> Inserting mal-encoded documents into CouchDB could interfere with document
> retrieval and indexing, see
> https://issues.apache.org/jira/browse/COUCHDB-345.  Possibly one of those
> got into your database and now it is stopping the rebuilding of views.  A
> patch recently got added to prevent mal-encoded documents from being
> accepted, but it does not fix the problem on an existing database that has
> been corrupted.   I do not know if the symptoms are the same as what you are
> observing, but I think it would be a likely culprit.
>

-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Curt Arnold <ca...@apache.org>.

On Oct 6, 2009, at 1:46 AM, Glenn Rempe wrote:
>
> - there is some kind of corruption in the main DB file and when this  
> point
> is reached (or a specific record in the DB?) that it crashes? If so  
> how can
> I best identify this?

Inserting mal-encoded documents into CouchDB could interfere with  
document retrieval and indexing, see https://issues.apache.org/jira/browse/COUCHDB-345 
.  Possibly one of those got into your database and now it is stopping  
the rebuilding of views.  A patch recently got added to prevent mal- 
encoded documents from being accepted, but it does not fix the problem  
on an existing database that has been corrupted.   I do not know if  
the symptoms are the same as what you are observing, but I think it  
would be a likely culprit.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Hi Paul.  No script attached...  Want to give it to me in a gist or pastie?

Thx

Glenn

On Tue, Oct 6, 2009 at 1:13 PM, Paul Davis <pa...@gmail.com> wrote:
> On Tue, Oct 6, 2009 at 3:42 PM, Glenn Rempe <gl...@rempe.us> wrote:
>> Is there some way we can instrument and log how much memory the VM
>> thinks it has somewhere in the critical path piece of erlang code?  Or
>> is there another way I can track that externally?
>>
>> G
>>
>> On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski <ko...@apache.org> wrote:
>>> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
>>>
>>> TMI logging doesn't really exist, no one uses that level internally.  I
>>> agree with Paul here, the lack of error messages indicates an instant VM
>>> death.  The most common way to cause that is by running out of memory, but
>>> view indexing is not supposed to use a large amount of memory at all.
>>>
>>> Adam
>>
>
> Here's a quick and dirty script that should work ish I think probably.
> I only tested it minimally. As it, no syntax errors and it prints the
> first vm size as expected.
>
> Just run that in a terminal while the indexer runs. It should die at
> the same time the the Erlang process dies.
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Paul wrote a little script to help me out with tracking mem.  Alas, he
didn't realize it wasn't attached until he was away from his computer.

I whipped up a little ruby version if its useful to anyone.

Just pass it a PID as a param and it will print out the newest
highwater memory mark for that process and a timestamp.

http://gist.github.com/203540

Cheers.

G

On Tue, Oct 6, 2009 at 1:13 PM, Paul Davis <pa...@gmail.com> wrote:
> On Tue, Oct 6, 2009 at 3:42 PM, Glenn Rempe <gl...@rempe.us> wrote:
>> Is there some way we can instrument and log how much memory the VM
>> thinks it has somewhere in the critical path piece of erlang code?  Or
>> is there another way I can track that externally?
>>
>> G
>>
>> On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski <ko...@apache.org> wrote:
>>> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
>>>
>>> TMI logging doesn't really exist, no one uses that level internally.  I
>>> agree with Paul here, the lack of error messages indicates an instant VM
>>> death.  The most common way to cause that is by running out of memory, but
>>> view indexing is not supposed to use a large amount of memory at all.
>>>
>>> Adam
>>
>
> Here's a quick and dirty script that should work ish I think probably.
> I only tested it minimally. As it, no syntax errors and it prints the
> first vm size as expected.
>
> Just run that in a terminal while the indexer runs. It should die at
> the same time the the Erlang process dies.
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Hi Seggy,

Yes, I already have the 50+ GB data file on an 500GB EBS volume.  And
trust me, I have a snapshot.  :-)  I just want to avoid spinning up
lots of X-Large instances unless we have something that would really
benefity from trying concurrently.

Gets pricey.  :-)

Cheers,

G

On Tue, Oct 6, 2009 at 1:26 PM, Seggy Umboh <se...@gmail.com> wrote:
> Glenn,
> Since you are running on EC2, may I also suggest you put the couchdb data
> directory on a EBS volume and snapshot it if you have not already, that way
> you can spin up EC2 instances to try each of the suggestions being given on
> the list on separate copies of the data
>
> Good luck!
>
>
> On Tue, Oct 6, 2009 at 1:13 PM, Paul Davis <pa...@gmail.com>wrote:
>
>> On Tue, Oct 6, 2009 at 3:42 PM, Glenn Rempe <gl...@rempe.us> wrote:
>> > Is there some way we can instrument and log how much memory the VM
>> > thinks it has somewhere in the critical path piece of erlang code?  Or
>> > is there another way I can track that externally?
>> >
>> > G
>> >
>> > On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski <ko...@apache.org>
>> wrote:
>> >> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
>> >>
>> >> TMI logging doesn't really exist, no one uses that level internally.  I
>> >> agree with Paul here, the lack of error messages indicates an instant VM
>> >> death.  The most common way to cause that is by running out of memory,
>> but
>> >> view indexing is not supposed to use a large amount of memory at all.
>> >>
>> >> Adam
>> >
>>
>> Here's a quick and dirty script that should work ish I think probably.
>> I only tested it minimally. As it, no syntax errors and it prints the
>> first vm size as expected.
>>
>> Just run that in a terminal while the indexer runs. It should die at
>> the same time the the Erlang process dies.
>>
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Seggy Umboh <se...@gmail.com>.

Glenn,
Since you are running on EC2, may I also suggest you put the couchdb data
directory on a EBS volume and snapshot it if you have not already, that way
you can spin up EC2 instances to try each of the suggestions being given on
the list on separate copies of the data

Good luck!


On Tue, Oct 6, 2009 at 1:13 PM, Paul Davis <pa...@gmail.com>wrote:

> On Tue, Oct 6, 2009 at 3:42 PM, Glenn Rempe <gl...@rempe.us> wrote:
> > Is there some way we can instrument and log how much memory the VM
> > thinks it has somewhere in the critical path piece of erlang code?  Or
> > is there another way I can track that externally?
> >
> > G
> >
> > On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski <ko...@apache.org>
> wrote:
> >> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
> >>
> >> TMI logging doesn't really exist, no one uses that level internally.  I
> >> agree with Paul here, the lack of error messages indicates an instant VM
> >> death.  The most common way to cause that is by running out of memory,
> but
> >> view indexing is not supposed to use a large amount of memory at all.
> >>
> >> Adam
> >
>
> Here's a quick and dirty script that should work ish I think probably.
> I only tested it minimally. As it, no syntax errors and it prints the
> first vm size as expected.
>
> Just run that in a terminal while the indexer runs. It should die at
> the same time the the Erlang process dies.
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

On Tue, Oct 6, 2009 at 3:42 PM, Glenn Rempe <gl...@rempe.us> wrote:
> Is there some way we can instrument and log how much memory the VM
> thinks it has somewhere in the critical path piece of erlang code?  Or
> is there another way I can track that externally?
>
> G
>
> On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski <ko...@apache.org> wrote:
>> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
>>
>> TMI logging doesn't really exist, no one uses that level internally.  I
>> agree with Paul here, the lack of error messages indicates an instant VM
>> death.  The most common way to cause that is by running out of memory, but
>> view indexing is not supposed to use a large amount of memory at all.
>>
>> Adam
>

Here's a quick and dirty script that should work ish I think probably.
I only tested it minimally. As it, no syntax errors and it prints the
first vm size as expected.

Just run that in a terminal while the indexer runs. It should die at
the same time the the Erlang process dies.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Robert Dionne <di...@dionne-associates.com>.

erlang:processes() will give you the list of pids. The function self()  
just returns the current pid, so you can call erlang:process_info(pid,  
memory)  for any process. However you'd really need to know which pid  
is causing the problem. I just thought the +d might provide a  
crash.dmp that might help determine if in fact the VM is blowing memory.




On Oct 6, 2009, at 5:51 PM, Glenn Rempe wrote:

> Hi Bob,
>
> I have the CouchDB running in interactive mode and I added the +d to a
> copy of the couchdb startup script immediately after the 'erl' call.
>
> I was wondering, is it possible, from within the erl shell to find
> specific processes and manually issue the 'erlang:process_info (self
> (), memory)' call from within the shell?
>
> Of course that would presume I know which process to monitor.  Which I
> don't really (And doing a ctrl-C in the shell and choosing the (p)
> options gives me a *huge* dump of info.)
>
> G
>
> On Tue, Oct 6, 2009 at 2:05 PM, Robert Dionne
> <di...@dionne-associates.com> wrote:
>> sorry I was jumping in and hadn't read this entire thread yet. In  
>> your
>> startup script look for the erlang command "erl ...." and add a +d
>>
>> or prefix your script invocation with ERL_FLAGS=+d .....
>>
>> I usually run out of trunk and would use a command like ERL_FLAGS=+d
>> ./utils/run -i      where the -i makes it interactive so one can  
>> execute
>> commands in an erlang shell. To use process_info you'd need to know  
>> where to
>> insert statements in the code and rebuild. So your best best for  
>> now is to
>> try the other suggestions first.
>>
>> If you're new to couchdb there's an IRC channel, #couchdb that  
>> usually has a
>> few devs in there who can provide quicker turnaround on ideas, good  
>> and bad
>> :)
>>
>> Cheers,
>>
>> Bob
>>
>>
>>
>>
>> On Oct 6, 2009, at 4:01 PM, Glenn Rempe wrote:
>>
>>> Thanks Robert.  No I have not tried the +d option and I don't see  
>>> that
>>> as one of the options on the 'couchdb' starter script.  (Frankly I
>>> don't know how to do that having only just begun to dabble in  
>>> Erlang.)
>>> Can you give me more information on how I would provide that option
>>> exactly?
>>>
>>> Regarding the instrumentation.  That sounds great.  But again, I am
>>> not familiar enough with the internals of the CouchDB code to know
>>> just where I should put that.  Is there someone that could help me
>>> with that and we can create a custom build of CouchDB that I can  
>>> run?
>>>
>>> Thanks for chiming into the conversation.  Much appreciated.
>>>
>>> Glenn
>>>
>>> On Tue, Oct 6, 2009 at 12:48 PM, Robert Dionne
>>> <di...@dionne-associates.com> wrote:
>>>>
>>>> Internally you can put some erlang:process_info (self (), memory)
>>>> statements
>>>> in.
>>
>>
>
>
>
> -- 
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Hi Bob,

I have the CouchDB running in interactive mode and I added the +d to a
copy of the couchdb startup script immediately after the 'erl' call.

I was wondering, is it possible, from within the erl shell to find
specific processes and manually issue the 'erlang:process_info (self
(), memory)' call from within the shell?

Of course that would presume I know which process to monitor.  Which I
don't really (And doing a ctrl-C in the shell and choosing the (p)
options gives me a *huge* dump of info.)

G

On Tue, Oct 6, 2009 at 2:05 PM, Robert Dionne
<di...@dionne-associates.com> wrote:
> sorry I was jumping in and hadn't read this entire thread yet. In your
> startup script look for the erlang command "erl ...." and add a +d
>
> or prefix your script invocation with ERL_FLAGS=+d .....
>
> I usually run out of trunk and would use a command like ERL_FLAGS=+d
> ./utils/run -i      where the -i makes it interactive so one can execute
> commands in an erlang shell. To use process_info you'd need to know where to
> insert statements in the code and rebuild. So your best best for now is to
> try the other suggestions first.
>
> If you're new to couchdb there's an IRC channel, #couchdb that usually has a
> few devs in there who can provide quicker turnaround on ideas, good and bad
> :)
>
> Cheers,
>
> Bob
>
>
>
>
> On Oct 6, 2009, at 4:01 PM, Glenn Rempe wrote:
>
>> Thanks Robert.  No I have not tried the +d option and I don't see that
>> as one of the options on the 'couchdb' starter script.  (Frankly I
>> don't know how to do that having only just begun to dabble in Erlang.)
>> Can you give me more information on how I would provide that option
>> exactly?
>>
>> Regarding the instrumentation.  That sounds great.  But again, I am
>> not familiar enough with the internals of the CouchDB code to know
>> just where I should put that.  Is there someone that could help me
>> with that and we can create a custom build of CouchDB that I can run?
>>
>> Thanks for chiming into the conversation.  Much appreciated.
>>
>> Glenn
>>
>> On Tue, Oct 6, 2009 at 12:48 PM, Robert Dionne
>> <di...@dionne-associates.com> wrote:
>>>
>>> Internally you can put some erlang:process_info (self (), memory)
>>> statements
>>> in.
>
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Robert Dionne <di...@dionne-associates.com>.

sorry I was jumping in and hadn't read this entire thread yet. In your  
startup script look for the erlang command "erl ...." and add a +d

or prefix your script invocation with ERL_FLAGS=+d .....

I usually run out of trunk and would use a command like ERL_FLAGS=+d ./ 
utils/run -i      where the -i makes it interactive so one can execute  
commands in an erlang shell. To use process_info you'd need to know  
where to insert statements in the code and rebuild. So your best best  
for now is to try the other suggestions first.

If you're new to couchdb there's an IRC channel, #couchdb that usually  
has a few devs in there who can provide quicker turnaround on ideas,  
good and bad :)

Cheers,

Bob

On Oct 6, 2009, at 4:01 PM, Glenn Rempe wrote:

> Thanks Robert.  No I have not tried the +d option and I don't see that
> as one of the options on the 'couchdb' starter script.  (Frankly I
> don't know how to do that having only just begun to dabble in Erlang.)
> Can you give me more information on how I would provide that option
> exactly?
>
> Regarding the instrumentation.  That sounds great.  But again, I am
> not familiar enough with the internals of the CouchDB code to know
> just where I should put that.  Is there someone that could help me
> with that and we can create a custom build of CouchDB that I can run?
>
> Thanks for chiming into the conversation.  Much appreciated.
>
> Glenn
>
> On Tue, Oct 6, 2009 at 12:48 PM, Robert Dionne
> <di...@dionne-associates.com> wrote:
>> Internally you can put some erlang:process_info (self (), memory)  
>> statements
>> in.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Hi Brian,

Here is the output of ulimit -a on my system:

root@ip-10-250-55-239:/tmp/collectl-3.3.6# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

It doesn't seem to me like I am limited.  But if someone who knows
better about these configuration items see something that might be
relevant please chime in (max locked memory and stack size limits
might be relevant?)!

Regarding the process size, Paul and I worked offline to talk through
lots of specifics  We did run a little highwater monitoring script
which I posted about earlier and that did not show any significant
growth of the beam process.  Ultimately we never did get to an
understanding of what was causing sudden VM death.  So since I was
never able to get through a full indexing run, in the name of
expediency I am going to take a different approach.  The good news
though is that there did not appear to be an obvious leak.

I'll be starting my migration over again to build a new Couch DB.
This time I will be using bulk inserts and sequential ID's from the
getgo (I started about halfway through last time).  In addition I am
going to bulk insert 1000 docs at a time, and every million docs (I
have 28mm) I will request the view to force an indexing, as well as
doing periodic DB and view compaction.  Hopefully this will build up
the DB and views concurrently and avoid the problems we were having
building an index from scratch from a 58GB DB which I was never able
to complete without VM death.  Unfortunately we didn't get to the
bottom of this to understand the root cause.

Some additional steps I will take which will hopefully help are:

- I will install couchdb itself on the 'local' ephemeral disk instead
of on the EBS volume.
- I will create a new EBS volume which I will dedicate for CouchDB
which will hopefully provide more IO throughput so I can read from
MySQL on one EBS volume and write to CouchDB DB/View on another
volume.
- I will write 000's to the entirety of the EBS volume before first
use.  This apparently alleviated a 5x write penalty on *first* write
to any block on an EBS volume.

I'll post an update once I am running and if all is looking well.  I
can also potentially share my Ruby migration script if anyone is
interested.

I do very much appreciate the tremendous amount of help that this list
has provided, and I especially want to thank Paul Davis for his
interest in getting to the heart of this and the several hours he
spent with me screen sharing to help co-diagnose the root cause.

Cheers,

Glenn

On Wed, Oct 7, 2009 at 12:39 AM, Brian Candler <B....@pobox.com> wrote:
> On Tue, Oct 06, 2009 at 01:36:13PM -0700, Glenn Rempe wrote:
>> No.  I have not noticed any correlation with the time.  Sometimes I
>> have seen it run during the day and die as well.  I've seen it die
>> lots of times...  ;-)  It seems like it is always dying though
>> somewhere between 2 and 6 million records processed out of 28 mm
>> (which might support the theory of memory starvation of some kind if
>> it is holding some of those records in memory unintentionally, even
>> though top reports nothing more than 4GB out of 15GB being used).
>
> You might have a per-process memory limit of some sort: either ulimit (see
> "ulimit -a"); or a hard-coded limitation in your O/S which limits a single
> process to 4GB, for example; or conceivably the erlang VM could have a 4GB
> limit.
>
> [I do vaguely remember something about people saying you should build erlang
> in 32-bit mode even under a 64-bit OS, but I could well have that wrong]
>
> Either way, if your process memory usage is continually growing and also
> approaching 4GB, I would be concerned. I don't see any reason why building a
> view should take an increasing amount of memory. It sounds like a leak.
>
> Regards,
>
> Brian.
>

-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Hi,

I have not seen CouchDB consume large amount of memory so
unfortunately I cannot provide insight into your issue (btw I bulk
save docs 5000 at a time).  I would suggest starting a new discussion
thread (instead of using this one) with specific details of how your
are using couchdb, how many and what type of docs are you trying to
bulk save, and what exactly are you seeing for memory usage, and what
processes specifically are spiking in memory.  The more details you
provide the better the CouchDB team and this group might be able to
provide you with some help.

Cheers,

Glenn

On Thu, Oct 15, 2009 at 5:27 AM, venkata subbarayudu
<av...@gmail.com> wrote:
> Hi Gleen,
>           I am new to couchdb, and doing some load testing on
> couchdb(0.10), I am using python(2.6.2 with couchdb-python-0.6) to insert
> documents(bulk-save) into couchdb, and couchdb is eating up all the
> available memory, but I'm not able to figureout what exactly causing couchdb
> to consume such a large memory (I have 32GB Machine..), please give me any
> insights to know how to debug this issue to know the root cause.
>
> Thanks in advance for all your help,
> Subbarayudu.
>
> On Wed, Oct 7, 2009 at 1:09 PM, Brian Candler <B....@pobox.com> wrote:
>
>> On Tue, Oct 06, 2009 at 01:36:13PM -0700, Glenn Rempe wrote:
>> > No.  I have not noticed any correlation with the time.  Sometimes I
>> > have seen it run during the day and die as well.  I've seen it die
>> > lots of times...  ;-)  It seems like it is always dying though
>> > somewhere between 2 and 6 million records processed out of 28 mm
>> > (which might support the theory of memory starvation of some kind if
>> > it is holding some of those records in memory unintentionally, even
>> > though top reports nothing more than 4GB out of 15GB being used).
>>
>> You might have a per-process memory limit of some sort: either ulimit (see
>> "ulimit -a"); or a hard-coded limitation in your O/S which limits a single
>> process to 4GB, for example; or conceivably the erlang VM could have a 4GB
>> limit.
>>
>> [I do vaguely remember something about people saying you should build
>> erlang
>> in 32-bit mode even under a 64-bit OS, but I could well have that wrong]
>>
>> Either way, if your process memory usage is continually growing and also
>> approaching 4GB, I would be concerned. I don't see any reason why building
>> a
>> view should take an increasing amount of memory. It sounds like a leak.
>>
>> Regards,
>>
>> Brian.
>>
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by venkata subbarayudu <av...@gmail.com>.

Hi Gleen,
           I am new to couchdb, and doing some load testing on
couchdb(0.10), I am using python(2.6.2 with couchdb-python-0.6) to insert
documents(bulk-save) into couchdb, and couchdb is eating up all the
available memory, but I'm not able to figureout what exactly causing couchdb
to consume such a large memory (I have 32GB Machine..), please give me any
insights to know how to debug this issue to know the root cause.

Thanks in advance for all your help,
Subbarayudu.

On Wed, Oct 7, 2009 at 1:09 PM, Brian Candler <B....@pobox.com> wrote:

> On Tue, Oct 06, 2009 at 01:36:13PM -0700, Glenn Rempe wrote:
> > No.  I have not noticed any correlation with the time.  Sometimes I
> > have seen it run during the day and die as well.  I've seen it die
> > lots of times...  ;-)  It seems like it is always dying though
> > somewhere between 2 and 6 million records processed out of 28 mm
> > (which might support the theory of memory starvation of some kind if
> > it is holding some of those records in memory unintentionally, even
> > though top reports nothing more than 4GB out of 15GB being used).
>
> You might have a per-process memory limit of some sort: either ulimit (see
> "ulimit -a"); or a hard-coded limitation in your O/S which limits a single
> process to 4GB, for example; or conceivably the erlang VM could have a 4GB
> limit.
>
> [I do vaguely remember something about people saying you should build
> erlang
> in 32-bit mode even under a 64-bit OS, but I could well have that wrong]
>
> Either way, if your process memory usage is continually growing and also
> approaching 4GB, I would be concerned. I don't see any reason why building
> a
> view should take an increasing amount of memory. It sounds like a leak.
>
> Regards,
>
> Brian.
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Brian Candler <B....@pobox.com>.

On Tue, Oct 06, 2009 at 01:36:13PM -0700, Glenn Rempe wrote:
> No.  I have not noticed any correlation with the time.  Sometimes I
> have seen it run during the day and die as well.  I've seen it die
> lots of times...  ;-)  It seems like it is always dying though
> somewhere between 2 and 6 million records processed out of 28 mm
> (which might support the theory of memory starvation of some kind if
> it is holding some of those records in memory unintentionally, even
> though top reports nothing more than 4GB out of 15GB being used).

You might have a per-process memory limit of some sort: either ulimit (see
"ulimit -a"); or a hard-coded limitation in your O/S which limits a single
process to 4GB, for example; or conceivably the erlang VM could have a 4GB
limit.

[I do vaguely remember something about people saying you should build erlang
in 32-bit mode even under a 64-bit OS, but I could well have that wrong]

Either way, if your process memory usage is continually growing and also
approaching 4GB, I would be concerned. I don't see any reason why building a
view should take an increasing amount of memory. It sounds like a leak.

Regards,

Brian.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Francisco Viramontes <pa...@freshout.us>.

I have a database transfer script going for a week now. Couchdb dies  
on me like 5 times a day and I have monit in place to restart the thing

It bulk inserts 1000 records also I have an update notification that  
updates all my views every time a database has an update.

Also there is another script running in parallel that queries a view  
and inserts data into other Doc

The only error I can find is OS process timeout

hope this helps the devs

PAco

On Oct 6, 2009, at 3:36 PM, Glenn Rempe wrote:

> not noticed any correlation with the time.  Sometimes I
> have seen it run during the day and die as well.  I've seen it die
> lots of times...  ;-)  It seems like it is always dying though
> somewhere between 2 and 6 million records processed out of 28 mm
> (which might support the theory of memory starvation of some kind if
> it is holding some of those records in memory unintentionally, even
> though top reports nothing more than 4GB out of 15GB being used).
>
> G

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

No.  I have not noticed any correlation with the time.  Sometimes I
have seen it run during the day and die as well.  I've seen it die
lots of times...  ;-)  It seems like it is always dying though
somewhere between 2 and 6 million records processed out of 28 mm
(which might support the theory of memory starvation of some kind if
it is holding some of those records in memory unintentionally, even
though top reports nothing more than 4GB out of 15GB being used).

G

On Tue, Oct 6, 2009 at 1:31 PM, Paul Davis <pa...@gmail.com> wrote:
> Also, the other thing in reference to the link to the other thread,
>
> You've said a couple times you run this over night and it dies the
> next day. Is it by chance dying at the same time each day?
>
> Paul
>
> On Tue, Oct 6, 2009 at 4:01 PM, Glenn Rempe <gl...@rempe.us> wrote:
>> Thanks Robert.  No I have not tried the +d option and I don't see that
>> as one of the options on the 'couchdb' starter script.  (Frankly I
>> don't know how to do that having only just begun to dabble in Erlang.)
>>  Can you give me more information on how I would provide that option
>> exactly?
>>
>> Regarding the instrumentation.  That sounds great.  But again, I am
>> not familiar enough with the internals of the CouchDB code to know
>> just where I should put that.  Is there someone that could help me
>> with that and we can create a custom build of CouchDB that I can run?
>>
>> Thanks for chiming into the conversation.  Much appreciated.
>>
>> Glenn
>>
>> On Tue, Oct 6, 2009 at 12:48 PM, Robert Dionne
>> <di...@dionne-associates.com> wrote:
>>> Internally you can put some erlang:process_info (self (), memory) statements
>>> in.
>>
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

Also, the other thing in reference to the link to the other thread,

You've said a couple times you run this over night and it dies the
next day. Is it by chance dying at the same time each day?

Paul

On Tue, Oct 6, 2009 at 4:01 PM, Glenn Rempe <gl...@rempe.us> wrote:
> Thanks Robert.  No I have not tried the +d option and I don't see that
> as one of the options on the 'couchdb' starter script.  (Frankly I
> don't know how to do that having only just begun to dabble in Erlang.)
>  Can you give me more information on how I would provide that option
> exactly?
>
> Regarding the instrumentation.  That sounds great.  But again, I am
> not familiar enough with the internals of the CouchDB code to know
> just where I should put that.  Is there someone that could help me
> with that and we can create a custom build of CouchDB that I can run?
>
> Thanks for chiming into the conversation.  Much appreciated.
>
> Glenn
>
> On Tue, Oct 6, 2009 at 12:48 PM, Robert Dionne
> <di...@dionne-associates.com> wrote:
>> Internally you can put some erlang:process_info (self (), memory) statements
>> in.
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Thanks Robert.  No I have not tried the +d option and I don't see that
as one of the options on the 'couchdb' starter script.  (Frankly I
don't know how to do that having only just begun to dabble in Erlang.)
 Can you give me more information on how I would provide that option
exactly?

Regarding the instrumentation.  That sounds great.  But again, I am
not familiar enough with the internals of the CouchDB code to know
just where I should put that.  Is there someone that could help me
with that and we can create a custom build of CouchDB that I can run?

Thanks for chiming into the conversation.  Much appreciated.

Glenn

On Tue, Oct 6, 2009 at 12:48 PM, Robert Dionne
<di...@dionne-associates.com> wrote:
> Internally you can put some erlang:process_info (self (), memory) statements
> in.

Re: Timeout Error when trying to access views + Indexing problems

Posted by Robert Dionne <di...@dionne-associates.com>.

Internally you can put some erlang:process_info (self (), memory)  
statements in.






On Oct 6, 2009, at 3:42 PM, Glenn Rempe wrote:

> Is there some way we can instrument and log how much memory the VM
> thinks it has somewhere in the critical path piece of erlang code?  Or
> is there another way I can track that externally?
>
> G
>
> On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski  
> <ko...@apache.org> wrote:
>> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
>>
>> TMI logging doesn't really exist, no one uses that level  
>> internally.  I
>> agree with Paul here, the lack of error messages indicates an  
>> instant VM
>> death.  The most common way to cause that is by running out of  
>> memory, but
>> view indexing is not supposed to use a large amount of memory at all.
>>
>> Adam

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Is there some way we can instrument and log how much memory the VM
thinks it has somewhere in the critical path piece of erlang code?  Or
is there another way I can track that externally?

G

On Tue, Oct 6, 2009 at 12:13 PM, Adam Kocoloski <ko...@apache.org> wrote:
> On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:
>
> TMI logging doesn't really exist, no one uses that level internally.  I
> agree with Paul here, the lack of error messages indicates an instant VM
> death.  The most common way to cause that is by running out of memory, but
> view indexing is not supposed to use a large amount of memory at all.
>
> Adam

Re: Timeout Error when trying to access views + Indexing problems

Posted by Adam Kocoloski <ko...@apache.org>.

On Oct 6, 2009, at 2:46 AM, Glenn Rempe wrote:

> Adam, I tried your suggestion to attempt to open the index file in
> interactive mode.  I was able to start CouchDB in interactive mode,  
> and the
> 'couch_file:open' command return ok instantly, but the
> 'couch_file:read_header(Fd).' provided no output for about 15 min  
> and I
> eventually aborted it.  Should it ever take that long?  What output  
> should I
> expect to see?

That's exactly the problem behavior I saw.  I don't remember what the  
expected output of couch_file:read_header(Fd) is, but it should return  
almost instantaneously.  This worries me.

On Oct 6, 2009, at 12:09 PM, Glenn Rempe wrote:

> Sorry, I think one of my questions was not clear.  I meant:  If I
> 'replicated' the DB from its current location to a new DB name on  
> the same
> host would that do the scan for malformed records (which wasn't run  
> on the
> beginnning of this DB since it was started on 0.9.1) and would  
> indexing
> happen incrementally during 'replication' if I setup that ruby index  
> script
> so I can avoid these huge indexing events which take a god-awful  
> long time
> before failing?

Replicating to a new DB would do the check only if the source or  
target is "remote", e.g.

{"source":"http://127.0.0.1:5984/foo", "target":"bar"}

I'm guessing a mal-formed document isn't the source of your problem,  
but if it is this replication would show you exactly which document is  
bad.

> There has *got* to be a way to bump up the logging level on all of  
> these
> processes.  Is that single line in the log about restarting CouchDB  
> really
> all I get?  With no indication at ALL of WHY its restarted (and  
> apparently
> killed my index processing in the process, or is that the indexing  
> process
> has died forcing a restart?  The logs just don't seem to want to  
> give that
> info up.) ?  Am I missing something?  I NEED logging data to find  
> out what
> the hell is going on here.  The silent death treatment is driving me  
> nuts.

TMI logging doesn't really exist, no one uses that level internally.   
I agree with Paul here, the lack of error messages indicates an  
instant VM death.  The most common way to cause that is by running out  
of memory, but view indexing is not supposed to use a large amount of  
memory at all.

Adam

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Hi Adam (and group!).  So I am still struggling, and need help or advice.
 It is much appreciated.
Adam, I tried your suggestion to attempt to open the index file in
interactive mode.  I was able to start CouchDB in interactive mode, and the
'couch_file:open' command return ok instantly, but the
'couch_file:read_header(Fd).' provided no output for about 15 min and I
eventually aborted it.  Should it ever take that long?  What output should I
expect to see?

Please note that to try and make sure there was nothing wrong with the index
files, I actually destroyed them and tried a rebuild from scratch.  This
seemed to be going ok, and the indexing process ran all night.  I got up to
roughly 20% of the 22 million records in the DB indexed when it mysteriously
died yet again.  I am noticing that it is at this point that I am always
seeing this process fail.

When the indexing fails it provides NO information in the couchdb log (with
log level set to 'debug').  I backtracked in the couchdb log file to find
where it stopped working, looking for a stack trace.  All I found was the
point at which it just stopped reporting progress and the couchjs process
was dead.  Here is that point in the log file:

http://pastie.org/642736

And this is the last line reporting indexing status (line 25 in the pastie):

[Mon, 05 Oct 2009 18:52:42 GMT] [debug] [<0.80.0>] New task status for
searchlight_production _design/SearchDocument: Processed 4377984 of 22650375
changes (19%)

After that, just silence in the log.

Here is one hint which suggests that perhaps it is not just the couchjs
process that is crashing, but perhaps the entire couchdb process?  In the
/var/log/syslog I see the following three minutes after the last entry
showing index status in the couch log:

http://pastie.org/642952

This is being logged by monit, which I am using to keep an eye on couchdb.
 I may turn off monit, and see if the entire couchdb process crashes when
the indexing stops.

UPDATE :
Even with monit turned off, it looks like the couchdb pids have changed
during or after the indexing process failure.  This is the 'ps -ef' output
after the crash, to compare with the output in the previous link which was
taken when the indexing was started:

http://pastie.org/643335
END UPDATE:

If I try and access that view in a rails console session after this failure
I am getting a 500 error:

>> counts = SearchDocument.by_engine(:reduce => true, :group => true, :stale
=> 'ok')['rows']
RestClient::RequestFailed: HTTP status code 500

And I see the following stack trace in the couch log:

[Mon, 05 Oct 2009 21:53:00 GMT] [info] [<0.301.0>] Stacktrace:
[{gen_server,call,2},
             {couch_view,get_group_server,2},
             {couch_view,get_group,3},
             {couch_view,get_map_view,4},
             {couch_httpd_view,design_doc_view,5},
             {couch_httpd_db,do_db_req,2},
             {couch_httpd,handle_request,5},
             {mochiweb_http,headers,5}]

Here is the full output:

http://pastie.org/642940

Is it possible that either:

- there is some kind of corruption in the main DB file and when this point
is reached (or a specific record in the DB?) that it crashes? If so how can
I best identify this?
- There is some file system related issue?  I tend to discount this option
just because it always seems to fail at roughly the same point in the
process.

What are the recommended next steps for finding the root cause of the issue?
 Should I insert a log() statement in the view code and try to run the whole
index with that in place?

UPDATE : I tried to run the index again with monit turned off, since I saw
some messages in the syslog about the couchdb process.  I was hoping that
perhaps monit was inadvertantly killing and restarting couchdb which was
causing the index failure.  Alas this was not the case.  It just died on me
again after 3.6 million records.  And there is some different error messages
in the log immediately after it stopped reporting on indexing progress.
 Please take a look at this pastie to see the logs:

http://pastie.org/643330

(There's a blog of error messages staring at line 25)

I am going to try tonight splitting the 7 views I currently have in a single
design doc in to 4 design docs according to the part of the app they pertain
to.  I am hoping that this will narrow it down if some design docs finish
indexing and others don't.

Thanks again for the help (and reading this long message!).

Glenn

On Sun, Oct 4, 2009 at 10:19 AM, Adam Kocoloski <ko...@apache.org> wrote:

> Hi Glenn, I saw something like this once, but unfortunately I wasn't able
> to resolve it.  Can you try the following?
>
> 1) Start couchdb with the -i option to get an interactive shell
>
> 2) At the prompt, try to open the view file and read the header
>
> {ok, Fd} =
> couch_file:open("/vol/couchdb/var/lib/couchdb/.searchlight_production_design/5f190ffb2ed589746e8796d2423869ac.view").
> couch_file:read_header(Fd).
>
> In my case, that call timed out, and the stacktrace during normal operation
> was exactly the one you showed earlier in this thread.  Best,
>
> Adam
>
>
> On Oct 4, 2009, at 2:07 AM, Glenn Rempe wrote:
>
>  Thanks for the reply Paul.  Some comments below.
>>
>> Also, just for full disclosure, the CouchDB I am working on was moved out
>> of
>> another couchdb and it was originally created using CDB 0.9.1.  I show a
>> dir
>> listing below that indicates exactly what was moved.
>>
>> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
>> paul.joseph.davis@gmail.com> wrote:
>>
>>  Glenn,
>>>
>>> This sounds like your map function is timing out which causes the error.
>>> You could try upping the os process timeout setting in the config.
>>>
>>>
>>>  When I go into futon and select one of my views in my design doc it
>> *always*
>> consistently pops up a javascript alert with the error text at ~5 seconds
>> after selecting the view.  It doesn't seem to matter what else I do.  It
>> also didn't vary when I changed the os_process_timeout value in futon as
>> you
>> suggested from 5000 to 25000.  Can you explain exactly what this
>> particular
>> param is doing?  I assume the value is milliseconds?
>>
>>
>>  To see what's going on you can increase to debug logging or use the log
>>> function in your maps. There's also the status page in futon which I
>>> think
>>> you said you were looking at.
>>>
>>>
>>>  Yes, I was previously looking at the status page.  But now since I've
>> upgraded to trunk I never see any indexing activity happening in the
>> status
>> page no matter what I do.
>>
>>
>>  If indexing crashes it should just pick up where it left off when you
>>> retrigger. Use the status page to verify. If it's not then let us know.
>>>
>>>
>>>  Can you clarify, is this also the case when no index has ever
>> successfully
>> run?  I was wondering if I first need to get through at least one index
>> session (maybe with a smaller amount of records) prior to incremental
>> indexing working as expected.
>>
>> Is there any way to determine what percentage of the total records have
>> been
>> added to the index?
>>
>> For your info, here are the contents of the DB dir.  You can see the main
>> DB
>> is 42GB now (~17 million records).
>>
>> root@ip-10-250-55-239
>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb
>> total 41674956
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
>> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
>> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
>> searchlight_production.couch
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
>> .searchlight_production_design
>>
>> root@ip-10-250-55-239
>> :/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
>> total 33700196
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
>> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
>> 5f190ffb2ed589746e8796d2423869ac.view
>> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
>> b127a58306fb8e7858cd1a92f8398511.view
>> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
>> SearchDocument.view
>>
>>
>>
>>  If you can't find anything in the debug logs then ping the lust and we'll
>>> get into trying to duplicate.
>>>
>>>
>>>  I have turned on the 'debug' level in the logs and that provided me with
>> the
>> info I previously provided.  I'll try to use the log function in the map
>> and
>> see if that shows anything.
>>
>> Thanks for helping.  If it comes to it, I may be able to make a snapshot
>> of
>> this EBS volume and start a host that you could login to and get your
>> hands
>> directly on it if that would be helpful.
>>
>> Glenn
>>
>
>

-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

On Sun, Oct 4, 2009 at 1:19 PM, Adam Kocoloski <ko...@apache.org> wrote:
> Hi Glenn, I saw something like this once, but unfortunately I wasn't able to
> resolve it.  Can you try the following?
>
> 1) Start couchdb with the -i option to get an interactive shell
>
> 2) At the prompt, try to open the view file and read the header
>
> {ok, Fd} =
> couch_file:open("/vol/couchdb/var/lib/couchdb/.searchlight_production_design/5f190ffb2ed589746e8796d2423869ac.view").
> couch_file:read_header(Fd).
>
> In my case, that call timed out, and the stacktrace during normal operation
> was exactly the one you showed earlier in this thread.  Best,
>
> Adam
>

Sounds like the header was borked and the timeout is because its
seeking back through the index file to find the previous one. Assuming
a long enough indexing run, it could theoretically be possible to take
longer than 5 seconds and cause a timeout in the gen_server. At least,
according to my brain debugger.

> On Oct 4, 2009, at 2:07 AM, Glenn Rempe wrote:
>
>> Thanks for the reply Paul.  Some comments below.
>>
>> Also, just for full disclosure, the CouchDB I am working on was moved out
>> of
>> another couchdb and it was originally created using CDB 0.9.1.  I show a
>> dir
>> listing below that indicates exactly what was moved.
>>
>> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
>> paul.joseph.davis@gmail.com> wrote:
>>
>>> Glenn,
>>>
>>> This sounds like your map function is timing out which causes the error.
>>> You could try upping the os process timeout setting in the config.
>>>
>>>
>> When I go into futon and select one of my views in my design doc it
>> *always*
>> consistently pops up a javascript alert with the error text at ~5 seconds
>> after selecting the view.  It doesn't seem to matter what else I do.  It
>> also didn't vary when I changed the os_process_timeout value in futon as
>> you
>> suggested from 5000 to 25000.  Can you explain exactly what this
>> particular
>> param is doing?  I assume the value is milliseconds?
>>
>>
>>> To see what's going on you can increase to debug logging or use the log
>>> function in your maps. There's also the status page in futon which I
>>> think
>>> you said you were looking at.
>>>
>>>
>> Yes, I was previously looking at the status page.  But now since I've
>> upgraded to trunk I never see any indexing activity happening in the
>> status
>> page no matter what I do.
>>
>>
>>> If indexing crashes it should just pick up where it left off when you
>>> retrigger. Use the status page to verify. If it's not then let us know.
>>>
>>>
>> Can you clarify, is this also the case when no index has ever successfully
>> run?  I was wondering if I first need to get through at least one index
>> session (maybe with a smaller amount of records) prior to incremental
>> indexing working as expected.
>>
>> Is there any way to determine what percentage of the total records have
>> been
>> added to the index?
>>
>> For your info, here are the contents of the DB dir.  You can see the main
>> DB
>> is 42GB now (~17 million records).
>>
>>
>> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb
>> total 41674956
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
>> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
>> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
>> searchlight_production.couch
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
>> .searchlight_production_design
>>
>>
>> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
>> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
>> total 33700196
>> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
>> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
>> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
>> 5f190ffb2ed589746e8796d2423869ac.view
>> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
>> b127a58306fb8e7858cd1a92f8398511.view
>> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
>> SearchDocument.view
>>
>>
>>
>>> If you can't find anything in the debug logs then ping the lust and we'll
>>> get into trying to duplicate.
>>>
>>>
>> I have turned on the 'debug' level in the logs and that provided me with
>> the
>> info I previously provided.  I'll try to use the log function in the map
>> and
>> see if that shows anything.
>>
>> Thanks for helping.  If it comes to it, I may be able to make a snapshot
>> of
>> this EBS volume and start a host that you could login to and get your
>> hands
>> directly on it if that would be helpful.
>>
>> Glenn
>
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Glenn, I saw something like this once, but unfortunately I wasn't  
able to resolve it.  Can you try the following?

1) Start couchdb with the -i option to get an interactive shell

2) At the prompt, try to open the view file and read the header

{ok, Fd} = couch_file:open("/vol/couchdb/var/lib/ 
couchdb/.searchlight_production_design/ 
5f190ffb2ed589746e8796d2423869ac.view").
couch_file:read_header(Fd).

In my case, that call timed out, and the stacktrace during normal  
operation was exactly the one you showed earlier in this thread.  Best,

Adam

On Oct 4, 2009, at 2:07 AM, Glenn Rempe wrote:

> Thanks for the reply Paul.  Some comments below.
>
> Also, just for full disclosure, the CouchDB I am working on was  
> moved out of
> another couchdb and it was originally created using CDB 0.9.1.  I  
> show a dir
> listing below that indicates exactly what was moved.
>
> On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
> paul.joseph.davis@gmail.com> wrote:
>
>> Glenn,
>>
>> This sounds like your map function is timing out which causes the  
>> error.
>> You could try upping the os process timeout setting in the config.
>>
>>
> When I go into futon and select one of my views in my design doc it  
> *always*
> consistently pops up a javascript alert with the error text at ~5  
> seconds
> after selecting the view.  It doesn't seem to matter what else I  
> do.  It
> also didn't vary when I changed the os_process_timeout value in  
> futon as you
> suggested from 5000 to 25000.  Can you explain exactly what this  
> particular
> param is doing?  I assume the value is milliseconds?
>
>
>> To see what's going on you can increase to debug logging or use the  
>> log
>> function in your maps. There's also the status page in futon which  
>> I think
>> you said you were looking at.
>>
>>
> Yes, I was previously looking at the status page.  But now since I've
> upgraded to trunk I never see any indexing activity happening in the  
> status
> page no matter what I do.
>
>
>> If indexing crashes it should just pick up where it left off when you
>> retrigger. Use the status page to verify. If it's not then let us  
>> know.
>>
>>
> Can you clarify, is this also the case when no index has ever  
> successfully
> run?  I was wondering if I first need to get through at least one  
> index
> session (maybe with a smaller amount of records) prior to incremental
> indexing working as expected.
>
> Is there any way to determine what percentage of the total records  
> have been
> added to the index?
>
> For your info, here are the contents of the DB dir.  You can see the  
> main DB
> is 42GB now (~17 million records).
>
> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/ 
> shared/log#
> ls -la /vol/couchdb/var/lib/couchdb
> total 41674956
> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
> drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
> -rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
> searchlight_production.couch
> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
> .searchlight_production_design
>
> root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/ 
> shared/log#
> ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
> total 33700196
> drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
> drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
> -rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
> 5f190ffb2ed589746e8796d2423869ac.view
> -rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
> b127a58306fb8e7858cd1a92f8398511.view
> -rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
> SearchDocument.view
>
>
>
>> If you can't find anything in the debug logs then ping the lust and  
>> we'll
>> get into trying to duplicate.
>>
>>
> I have turned on the 'debug' level in the logs and that provided me  
> with the
> info I previously provided.  I'll try to use the log function in the  
> map and
> see if that shows anything.
>
> Thanks for helping.  If it comes to it, I may be able to make a  
> snapshot of
> this EBS volume and start a host that you could login to and get  
> your hands
> directly on it if that would be helpful.
>
> Glenn

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Thanks for the reply Paul.  Some comments below.

Also, just for full disclosure, the CouchDB I am working on was moved out of
another couchdb and it was originally created using CDB 0.9.1.  I show a dir
listing below that indicates exactly what was moved.

On Sat, Oct 3, 2009 at 6:46 PM, Paul Joseph Davis <
paul.joseph.davis@gmail.com> wrote:

> Glenn,
>
> This sounds like your map function is timing out which causes the error.
> You could try upping the os process timeout setting in the config.
>
>
When I go into futon and select one of my views in my design doc it *always*
consistently pops up a javascript alert with the error text at ~5 seconds
after selecting the view.  It doesn't seem to matter what else I do.  It
also didn't vary when I changed the os_process_timeout value in futon as you
suggested from 5000 to 25000.  Can you explain exactly what this particular
param is doing?  I assume the value is milliseconds?


> To see what's going on you can increase to debug logging or use the log
> function in your maps. There's also the status page in futon which I think
> you said you were looking at.
>
>
Yes, I was previously looking at the status page.  But now since I've
upgraded to trunk I never see any indexing activity happening in the status
page no matter what I do.


> If indexing crashes it should just pick up where it left off when you
> retrigger. Use the status page to verify. If it's not then let us know.
>
>
Can you clarify, is this also the case when no index has ever successfully
run?  I was wondering if I first need to get through at least one index
session (maybe with a smaller amount of records) prior to incremental
indexing working as expected.

Is there any way to determine what percentage of the total records have been
added to the index?

For your info, here are the contents of the DB dir.  You can see the main DB
is 42GB now (~17 million records).

root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
ls -la /vol/couchdb/var/lib/couchdb
total 41674956
drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 .
drwxr-xr-x 3 couchdb root             20 2009-10-03 05:02 ..
-rw-r--r-- 1 couchdb couchdb 42675073133 2009-10-04 02:13
searchlight_production.couch
drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02
.searchlight_production_design

root@ip-10-250-55-239:/home/rails/underscore-sync-mysql-to-couchdb/shared/log#
ls -la /vol/couchdb/var/lib/couchdb/.searchlight_production_design/
total 33700196
drwxr-xr-x 2 couchdb couchdb         120 2009-10-03 06:02 .
drwxr-xr-x 3 couchdb root             78 2009-10-04 00:46 ..
-rw-r--r-- 1 couchdb couchdb  9819347287 2009-10-03 08:04
5f190ffb2ed589746e8796d2423869ac.view
-rw-r--r-- 1 couchdb couchdb    91402872 2009-10-03 06:03
b127a58306fb8e7858cd1a92f8398511.view
-rw-r--r-- 1 couchdb couchdb 24598236884 2009-10-02 13:00
SearchDocument.view



> If you can't find anything in the debug logs then ping the lust and we'll
> get into trying to duplicate.
>
>
I have turned on the 'debug' level in the logs and that provided me with the
info I previously provided.  I'll try to use the log function in the map and
see if that shows anything.

Thanks for helping.  If it comes to it, I may be able to make a snapshot of
this EBS volume and start a host that you could login to and get your hands
directly on it if that would be helpful.

Glenn

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Yes, I love CouchDB, but 'lust' maybe too strong a word.  ;-)


Phone fail. Ping the *list* rather.
>
> Paul
>
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Davis <pa...@gmail.com>.

On Sat, Oct 3, 2009 at 9:46 PM, Paul Joseph Davis
<pa...@gmail.com> wrote:
> Glenn,
>
> This sounds like your map function is timing out which causes the error. You
> could try upping the os process timeout setting in the config.
>
> To see what's going on you can increase to debug logging or use the log
> function in your maps. There's also the status page in futon which I think
> you said you were looking at.
>
> If indexing crashes it should just pick up where it left off when you
> retrigger. Use the status page to verify. If it's not then let us know.
>
> If you can't find anything in the debug logs then ping the lust and we'll
> get into trying to duplicate.
>

Phone fail. Ping the *list* rather.

Paul

> Paul Davis
>
> On Oct 3, 2009, at 9:24 PM, Glenn Rempes <gl...@rempe.us> wrote:
>
>> Slightly more info on this.  I see the following stack trace when this
>> happens:
>> [Sun, 04 Oct 2009 01:18:41 GMT] [info] [<0.3343.0>] Stacktrace:
>> [{gen_server,call,2},
>>            {couch_view,get_group_server,2},
>>            {couch_view,get_group,3},
>>            {couch_view,get_map_view,4},
>>            {couch_httpd_view,design_doc_view,5},
>>            {couch_httpd_db,do_db_req,2},
>>            {couch_httpd,handle_request,5},
>>            {mochiweb_http,headers,5}]
>>
>>
>> And I was suspecting that perhaps it was related to low ram or cpu on the
>> EC2 instance I am running on (with the couchdb on an EBS volume) and
>> upgraded to an extra large with 15GB RAM, and four cores.
>>
>> No difference at all.  I get this error now almost instantly whenever I
>> select any of the views you see in the pastie below in the single design
>> doc.
>>
>> Help!?  :-)
>>
>> Thanks.
>>
>> Glenn
>>
>> On Sat, Oct 3, 2009 at 9:10 AM, Glenn Rempe <gl...@rempe.us> wrote:
>>
>>> Hello all,
>>> I am looking for some guidance on how I can eliminate an error I am
>>> seeing
>>> when trying to access views, and help with getting through indexing a
>>> large
>>> design document.
>>>
>>> Yesterday I upgraded to a trunk install of CouchDB (0.11.0b) in an
>>> attempt
>>> to resolve my second problem (see below). I have a DB that currently has
>>> about 16 million records in it and I am in the midst of importing more up
>>> to
>>> a total of about 26 million.  Yesterday when I would try to access one of
>>> my
>>> map/reduce views I would see the indexing process kick off in the Futon
>>> status page and I would see the couchjs process in 'top'.  But today, if
>>> I
>>> try to access any view I see the following error from CouchDB within
>>> about 3
>>> seconds from requesting any view:
>>>
>>> http://pastie.org/640511
>>>
>>> The first few lines of it are:
>>>
>>> Error: timeout{gen_server,call,
>>>   [couch_view,
>>>    {get_group_server,<<"searchlight_production">>,
>>>        {group,
>>>            <<95,25,15,251,46,213,137,116,110,135,150,210,66,56,105,172>>,
>>>            nil,nil,<<"_design/SearchDocument">>,<<"javascript">>,[],
>>>            [{view,0,
>>>
>>>
>>> I have tried without success restarting the CouchDB several times.
>>>
>>> Any thoughts as to what might be happening here and how I might prevent
>>> it?
>>>
>>> Related to this is my second problem.  Whenever I have tried to index a
>>> view of this large DB the indexing process seems to silently die out
>>> after a
>>> while and it never get through indexing the whole DB.  I have seen it get
>>> through 10's of thousands up to a few million docs before dying (out of
>>> millions).  Questions:
>>>
>>> - Is there a recommended method to figure out what is happening in the
>>> internals of the indexing that may be causing it to fail?
>>> - If indexing fails before having gone through the entire result set at
>>> least once does it continue where it left off at the last crash?  Or does
>>> it
>>> need to start the whole indexing process over from scratch?
>>> - How can I best ensure that my large DB gets fully indexed?
>>>
>>> Thank you for the help.
>>>
>>> Glenn
>>>
>>> --
>>> Glenn Rempe
>>>
>>> email                 : glenn@rempe.us
>>> voice                 : (415) 894-5366 or (415)-89G-LENN
>>> twitter                : @grempe
>>> contact info        : http://www.rempe.us/contact.html
>>> pgp                    : http://www.rempe.us/gnupg.txt
>>>
>>>
>>
>>
>> --
>> Glenn Rempe
>>
>> email                 : glenn@rempe.us
>> voice                 : (415) 894-5366 or (415)-89G-LENN
>> twitter                : @grempe
>> contact info        : http://www.rempe.us/contact.html
>> pgp                    : http://www.rempe.us/gnupg.txt
>

Re: Timeout Error when trying to access views + Indexing problems

Posted by Paul Joseph Davis <pa...@gmail.com>.

Glenn,

This sounds like your map function is timing out which causes the  
error. You could try upping the os process timeout setting in the  
config.

To see what's going on you can increase to debug logging or use the  
log function in your maps. There's also the status page in futon which  
I think you said you were looking at.

If indexing crashes it should just pick up where it left off when you  
retrigger. Use the status page to verify. If it's not then let us know.

If you can't find anything in the debug logs then ping the lust and  
we'll get into trying to duplicate.

Paul Davis

On Oct 3, 2009, at 9:24 PM, Glenn Rempes <gl...@rempe.us> wrote:

> Slightly more info on this.  I see the following stack trace when this
> happens:
> [Sun, 04 Oct 2009 01:18:41 GMT] [info] [<0.3343.0>] Stacktrace:
> [{gen_server,call,2},
>             {couch_view,get_group_server,2},
>             {couch_view,get_group,3},
>             {couch_view,get_map_view,4},
>             {couch_httpd_view,design_doc_view,5},
>             {couch_httpd_db,do_db_req,2},
>             {couch_httpd,handle_request,5},
>             {mochiweb_http,headers,5}]
>
>
> And I was suspecting that perhaps it was related to low ram or cpu  
> on the
> EC2 instance I am running on (with the couchdb on an EBS volume) and
> upgraded to an extra large with 15GB RAM, and four cores.
>
> No difference at all.  I get this error now almost instantly  
> whenever I
> select any of the views you see in the pastie below in the single  
> design
> doc.
>
> Help!?  :-)
>
> Thanks.
>
> Glenn
>
> On Sat, Oct 3, 2009 at 9:10 AM, Glenn Rempe <gl...@rempe.us> wrote:
>
>> Hello all,
>> I am looking for some guidance on how I can eliminate an error I am  
>> seeing
>> when trying to access views, and help with getting through indexing  
>> a large
>> design document.
>>
>> Yesterday I upgraded to a trunk install of CouchDB (0.11.0b) in an  
>> attempt
>> to resolve my second problem (see below). I have a DB that  
>> currently has
>> about 16 million records in it and I am in the midst of importing  
>> more up to
>> a total of about 26 million.  Yesterday when I would try to access  
>> one of my
>> map/reduce views I would see the indexing process kick off in the  
>> Futon
>> status page and I would see the couchjs process in 'top'.  But  
>> today, if I
>> try to access any view I see the following error from CouchDB  
>> within about 3
>> seconds from requesting any view:
>>
>> http://pastie.org/640511
>>
>> The first few lines of it are:
>>
>> Error: timeout{gen_server,call,
>>    [couch_view,
>>     {get_group_server,<<"searchlight_production">>,
>>         {group,
>>              
>> <<95,25,15,251,46,213,137,116,110,135,150,210,66,56,105,172>>,
>>             nil,nil,<<"_design/SearchDocument">>,<<"javascript">>,[],
>>             [{view,0,
>>
>>
>> I have tried without success restarting the CouchDB several times.
>>
>> Any thoughts as to what might be happening here and how I might  
>> prevent it?
>>
>> Related to this is my second problem.  Whenever I have tried to  
>> index a
>> view of this large DB the indexing process seems to silently die  
>> out after a
>> while and it never get through indexing the whole DB.  I have seen  
>> it get
>> through 10's of thousands up to a few million docs before dying  
>> (out of
>> millions).  Questions:
>>
>> - Is there a recommended method to figure out what is happening in  
>> the
>> internals of the indexing that may be causing it to fail?
>> - If indexing fails before having gone through the entire result  
>> set at
>> least once does it continue where it left off at the last crash?   
>> Or does it
>> need to start the whole indexing process over from scratch?
>> - How can I best ensure that my large DB gets fully indexed?
>>
>> Thank you for the help.
>>
>> Glenn
>>
>> --
>> Glenn Rempe
>>
>> email                 : glenn@rempe.us
>> voice                 : (415) 894-5366 or (415)-89G-LENN
>> twitter                : @grempe
>> contact info        : http://www.rempe.us/contact.html
>> pgp                    : http://www.rempe.us/gnupg.txt
>>
>>
>
>
> -- 
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt

Re: Timeout Error when trying to access views + Indexing problems

Posted by Glenn Rempe <gl...@rempe.us>.

Slightly more info on this.  I see the following stack trace when this
happens:
[Sun, 04 Oct 2009 01:18:41 GMT] [info] [<0.3343.0>] Stacktrace:
[{gen_server,call,2},
             {couch_view,get_group_server,2},
             {couch_view,get_group,3},
             {couch_view,get_map_view,4},
             {couch_httpd_view,design_doc_view,5},
             {couch_httpd_db,do_db_req,2},
             {couch_httpd,handle_request,5},
             {mochiweb_http,headers,5}]


And I was suspecting that perhaps it was related to low ram or cpu on the
EC2 instance I am running on (with the couchdb on an EBS volume) and
upgraded to an extra large with 15GB RAM, and four cores.

No difference at all.  I get this error now almost instantly whenever I
select any of the views you see in the pastie below in the single design
doc.

Help!?  :-)

Thanks.

Glenn

On Sat, Oct 3, 2009 at 9:10 AM, Glenn Rempe <gl...@rempe.us> wrote:

> Hello all,
> I am looking for some guidance on how I can eliminate an error I am seeing
> when trying to access views, and help with getting through indexing a large
> design document.
>
> Yesterday I upgraded to a trunk install of CouchDB (0.11.0b) in an attempt
> to resolve my second problem (see below). I have a DB that currently has
> about 16 million records in it and I am in the midst of importing more up to
> a total of about 26 million.  Yesterday when I would try to access one of my
> map/reduce views I would see the indexing process kick off in the Futon
> status page and I would see the couchjs process in 'top'.  But today, if I
> try to access any view I see the following error from CouchDB within about 3
> seconds from requesting any view:
>
> http://pastie.org/640511
>
> The first few lines of it are:
>
> Error: timeout{gen_server,call,
>     [couch_view,
>      {get_group_server,<<"searchlight_production">>,
>          {group,
>              <<95,25,15,251,46,213,137,116,110,135,150,210,66,56,105,172>>,
>              nil,nil,<<"_design/SearchDocument">>,<<"javascript">>,[],
>              [{view,0,
>
>
> I have tried without success restarting the CouchDB several times.
>
> Any thoughts as to what might be happening here and how I might prevent it?
>
> Related to this is my second problem.  Whenever I have tried to index a
> view of this large DB the indexing process seems to silently die out after a
> while and it never get through indexing the whole DB.  I have seen it get
> through 10's of thousands up to a few million docs before dying (out of
> millions).  Questions:
>
> - Is there a recommended method to figure out what is happening in the
> internals of the indexing that may be causing it to fail?
> - If indexing fails before having gone through the entire result set at
> least once does it continue where it left off at the last crash?  Or does it
> need to start the whole indexing process over from scratch?
> - How can I best ensure that my large DB gets fully indexed?
>
> Thank you for the help.
>
> Glenn
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>
>


-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt