You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Mark Hammond (JIRA)" <ji...@apache.org> on 2009/04/18 07:59:14 UTC
[jira] Created: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
------------------------------------------------------------------------------------
Key: COUCHDB-326
URL: https://issues.apache.org/jira/browse/COUCHDB-326
Project: CouchDB
Issue Type: Bug
Affects Versions: 0.9
Environment: Windows, couch 0.9, erlang R12B 5.6.5
Reporter: Mark Hammond
On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
The log output shown is:
[debug] [<0.18650.6>] httpd 500 error response:
{"error":"error","reason":"eacces"}
[info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
A slightly snipped transcript from IRC:
(2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
(2:58:54 PM) alisdair: yeah, it's probably a race condition
(2:59:13 PM) alisdair: where the delete is tried before the fd is let go
(2:59:26 PM) alisdair: the reader fd that is
(2:59:32 PM) markh: yeah
...
(3:11:47 PM) alisdair: i can't find an obvious deadlock
(3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
(3:12:23 PM) alisdair: before deleting it
(3:15:15 PM) alisdair: i think i found the problem
(3:15:23 PM) alisdair: but i need a windows machine to confirm
(3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Mark Hammond (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721585#action_12721585 ]
Mark Hammond commented on COUCHDB-326:
--------------------------------------
Just got some clarification from damien:
(10:33:56 AM) damienkatz: it waits to get the killed message, but the point isn't to wait for an orderly shutdown, just to clear the message.
(10:34:16 AM) markh: right - so that receive does do something important...
(10:34:44 AM) damienkatz: yes, if it didn't, the couch_server process would die.
(10:35:08 AM) damienkatz: as it would get the message later and not know where it came from, so it assumes something bad happened.
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Kenneth LeFebvre (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth LeFebvre updated COUCHDB-326:
-------------------------------------
Comment: was deleted
(was: This may not be a permanent solution, but here's a copy of the change I made to make all the unit tests pass.
On my workstation, the sleep wasn't enough to consistently succeed for me, and I didn't want to increase the sleep across the board.
)
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "michael h (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804331#action_12804331 ]
michael h commented on COUCHDB-326:
-----------------------------------
This issue still exists in the windows binary installer version 0.11.0b897093
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Joseph Davis updated COUCHDB-326:
--------------------------------------
Skill Level: Regular Contributors Level (Easy to Medium)
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Kenneth LeFebvre (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth LeFebvre updated COUCHDB-326:
-------------------------------------
Attachment: delete-database.patch
This may not be a permanent solution, but here's a copy of the change I made to make all the unit tests pass.
On my workstation, the sleep wasn't enough to consistently succeed for me, and I didn't want to increase the sleep across the board.
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
> Attachments: delete-database.patch
>
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Eric Desgranges (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857501#action_12857501 ]
Eric Desgranges commented on COUCHDB-326:
-----------------------------------------
I'm running into the same issue with CouchDB 0.11 on WIndows 7. It definitively looks like a race condition when deleting either databases or documents.
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Chris McKee (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872152#action_12872152 ]
Chris McKee commented on COUCHDB-326:
-------------------------------------
Issues quite well described here http://osdir.com/ml/couchdb-user/2009-10/msg00185.html
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "alisdair sullivan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700990#action_12700990 ]
alisdair sullivan commented on COUCHDB-326:
-------------------------------------------
exit(Pid, kill),
receive {'EXIT', Pid, close} -> ok end,
To achieve a clean shutdown of the db and it's child processes, you need to send it a signal other than kill and give it a chance to shutdown and cleanup it's processes. The receive here doesn't do anything as a killed process sends the 'EXIT' msg immediately upon being killed.
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-326) Occasional
{"error":"error","reason":"eacces"} errors deleting a database on Windows
Posted by "Kenneth LeFebvre (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth LeFebvre updated COUCHDB-326:
-------------------------------------
Attachment: (was: delete-database.patch)
> Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-326
> URL: https://issues.apache.org/jira/browse/COUCHDB-326
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 0.9
> Environment: Windows, couch 0.9, erlang R12B 5.6.5
> Reporter: Mark Hammond
>
> On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
> The log output shown is:
> [debug] [<0.18650.6>] httpd 500 error response:
> {"error":"error","reason":"eacces"}
> [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
> A slightly snipped transcript from IRC:
> (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
> (2:58:54 PM) alisdair: yeah, it's probably a race condition
> (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
> (2:59:26 PM) alisdair: the reader fd that is
> (2:59:32 PM) markh: yeah
> ...
> (3:11:47 PM) alisdair: i can't find an obvious deadlock
> (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
> (3:12:23 PM) alisdair: before deleting it
> (3:15:15 PM) alisdair: i think i found the problem
> (3:15:23 PM) alisdair: but i need a windows machine to confirm
> (3:15:30 PM) alisdair: i'll look into it tomorrow
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.