You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mark Anderson <ma...@opscode.com> on 2010/07/15 02:42:30 UTC

Need help diagnosing couchdb view 404s

  We're having a bit of a problem with couchdb views 'disappearing', and 
I've not been able to make headway diagnosing the problem. I'd love to 
have suggestions on how to isolate this.

After some time running, we start seeing errors in the log:
2010-07-12_17:59:04.82351 [info] [<0.8900.246>] 10.192.210.79 - - 'GET' 
/authorization/_design/objects/_view/by_type?key=%2245c4ef2de7981991a5aaf23cd7fb0bbf%22 
404

Running curl
% curl 
'localhost:5984//authorization/_design/objects/_view/by_type?limit=0'
{"error":"not_found","reason":"missing_named_view"}
This would normally succeed.

The design documents are present and can be fetched.
% curl 'localhost:5984//authorization/_design/objects'
{"_id":"_design/objects","_rev":"1-3598677772","views":{"by_type":{"map":"function(doc) 
{emit(doc._id,doc.type)}"}}}

Couchdb never recovers. Restarting couchdb fixes the problem. This 
problem repeats, in the sense that it happens pretty consistently, but 
we've had trouble reproducing the problem neatly; synthetic couchdb 
workloads do not seem to trigger this.

The only patterns we've been able to spot is that they seem to happen 
right after a series of rapid updates to an document indexed by the view 
in question. We see a bunch of 'PUT' entries, and 'checkpointing view 
update at seq XXX for authorization'. This is happening across all of 
our databases, but seems connected with load.

* Using native erlang views does not seem to prevent the problem, just 
defer it a bit.
* There is plenty of disk space; 30GB used in a 100GB partition. None of 
the databases are larger than 1GB, but some of the views get very large 
(12G or more)

We're running CouchDB 0.11.0; Ubuntu 10.04. I've not yet been able to 
repro the problem in 1.0.0, and will try 0.11.1 as soon as I give up on 
breaking 1.0.0.

I have a ec2 machine in this state, so if anyone has suggestions of 
diagnostics to run or the like, I'd be glad to poke at it a bit.




Re: Need help diagnosing couchdb view 404s

Posted by Mark Anderson <ma...@opscode.com>.
  On 7/15/10 12:19 PM, J Chris Anderson wrote:
> Hmm. Disappearing beam... not my area of expertise, but I feel like 
> there has been discussion of similar issues elsewhere in the last few 
> weeks.

I'm confused; beam seems to be still running...
root       458  2.2  2.9 122596 52140 ?        Sl   Jul06 294:26 
/usr/local/lib/erlang/erts-5.7.4/bin/beam.smp -Bd -K true -- -root 
/usr/local/lib/erlang -progname erl -- -home /tmp -- -noshell -noinput 
-sasl errlog_type error -couch_ini /srv/couchdb/etc/couchdb/default.ini 
/srv/couchdb/etc/couchdb/local.ini -s couch

And the system is still responding:

% curl localhost:5984/authorization/_design/objects/_view/by_type?limit=0
{"error":"not_found","reason":"missing_named_view"}
% curl localhost:5984/authorization/_design/objects/_info
{"name":"objects","view_index":{"signature":"3248fa31db7fda9bbf6fc1ea172d742d","language":"javascript","disk_size":12723224698.0,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":3068938,"purge_seq":0}}
% curl localhost:5984/authorization
{"db_name":"authorization","doc_count":578753,"doc_del_count":472914,"update_seq":3068938,"purge_seq":0,"compact_running":false,"disk_size":902602861,"instance_start_time":"1278831651839759","disk_format_version":5}

 From the update sequence number, it even looks like the view is up to date.

You can put to the db, and the view update sequence increments to match; 
but it still responds "not found" as above after.






Re: Need help diagnosing couchdb view 404s

Posted by J Chris Anderson <jc...@gmail.com>.
On Jul 15, 2010, at 10:58 AM, Mark Anderson wrote:

> 
> On 7/14/10 5:51 PM, J Chris Anderson wrote:
>> 
>> On Jul 14, 2010, at 5:42 PM, Mark Anderson wrote:
>> 
>>> Running curl
>>> % curl 'localhost:5984//authorization/_design/objects/_view/by_type?limit=0'
>> 
>> are the double slashes after 5984 intentional? does removing them help?
> Accidental, but removing them doesn't affect the result.
>> this could be an issue with being able to open another file-descriptor?
> It seems unlikely, since there are only 25 fd open, and the limit is 65535, and the operating system has plenty available.
> 
> I should mention that there are no erlang stack traces in the log. If there were problems opening a fd wouldn't that throw an error in erlang?
> 


Yes that sounds like it should log something.

Hmm. Disappearing beam... not my area of expertise, but I feel like there has been discussion of similar issues elsewhere in the last few weeks.

Anyone?

Chris

Re: Need help diagnosing couchdb view 404s

Posted by Mark Anderson <ma...@opscode.com>.
On 7/14/10 5:51 PM, J Chris Anderson wrote:
>
> On Jul 14, 2010, at 5:42 PM, Mark Anderson wrote:
>
>> Running curl
>> % curl 
>> 'localhost:5984//authorization/_design/objects/_view/by_type?limit=0'
>
> are the double slashes after 5984 intentional? does removing them help?
Accidental, but removing them doesn't affect the result.
> this could be an issue with being able to open another file-descriptor?
It seems unlikely, since there are only 25 fd open, and the limit is 
65535, and the operating system has plenty available.

I should mention that there are no erlang stack traces in the log. If 
there were problems opening a fd wouldn't that throw an error in erlang?



Re: Need help diagnosing couchdb view 404s

Posted by J Chris Anderson <jc...@gmail.com>.
On Jul 14, 2010, at 5:42 PM, Mark Anderson wrote:

> We're having a bit of a problem with couchdb views 'disappearing', and I've not been able to make headway diagnosing the problem. I'd love to have suggestions on how to isolate this.
> 
> After some time running, we start seeing errors in the log:
> 2010-07-12_17:59:04.82351 [info] [<0.8900.246>] 10.192.210.79 - - 'GET' /authorization/_design/objects/_view/by_type?key=%2245c4ef2de7981991a5aaf23cd7fb0bbf%22 404
> 
> Running curl
> % curl 'localhost:5984//authorization/_design/objects/_view/by_type?limit=0'

are the double slashes after 5984 intentional? does removing them help?

> {"error":"not_found","reason":"missing_named_view"}
> This would normally succeed.
> 
> The design documents are present and can be fetched.
> % curl 'localhost:5984//authorization/_design/objects'
> {"_id":"_design/objects","_rev":"1-3598677772","views":{"by_type":{"map":"function(doc) {emit(doc._id,doc.type)}"}}}
> 
> Couchdb never recovers. Restarting couchdb fixes the problem. This problem repeats, in the sense that it happens pretty consistently, but we've had trouble reproducing the problem neatly; synthetic couchdb workloads do not seem to trigger this.
> 
> The only patterns we've been able to spot is that they seem to happen right after a series of rapid updates to an document indexed by the view in question. We see a bunch of 'PUT' entries, and 'checkpointing view update at seq XXX for authorization'. This is happening across all of our databases, but seems connected with load.
> 

this could be an issue with being able to open another file-descriptor?

> * Using native erlang views does not seem to prevent the problem, just defer it a bit.
> * There is plenty of disk space; 30GB used in a 100GB partition. None of the databases are larger than 1GB, but some of the views get very large (12G or more)
> 
> We're running CouchDB 0.11.0; Ubuntu 10.04. I've not yet been able to repro the problem in 1.0.0, and will try 0.11.1 as soon as I give up on breaking 1.0.0.
> 
> I have a ec2 machine in this state, so if anyone has suggestions of diagnostics to run or the like, I'd be glad to poke at it a bit.
> 
> 
>