You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Manuel Teira <mt...@tid.es> on 2008/06/26 12:12:44 UTC

QPID-1148, r671604, lockf, flock and fcntl

Hello.
After further  investigation and tests, related with the change in 
r671604 to drop the file locking strategy in favour of a flock on the 
data dir.

Trying to write a similar code, but using lockf, I hit the issue that 
the file must be opened using O_RDWR or O_RWONLY, and that's not allowed 
for a directory.
The same happens trying to use a fcntl call.
And unexpectedly, the same for flock. In the solaris manual page:

<snip>
     Read permission is required on a file  to  obtain  a  shared
     lock,   and  write  permission  is  required  to  obtain  an
     exclusive lock.
</snip>

But the linux man page claims:

<snip>
A shared or exclusive lock can be placed on a file regardless of the 
mode in which the file was opened.
</snip>

I've searched the web for some BSD system pages, but they don't say 
anything about the file mode.


On the other way, POSIX fcntl specification says, apropos the failure 
causes:

[EBADF]
    The /fildes/ argument is not a valid open file descriptor, or the
    argument /cmd/ is F_SETLK or F_SETLKW, the type of lock, *l_type*,
    is a shared lock (F_RDLCK), and /fildes/ is not a valid file
    descriptor open for reading, or the type of lock *l_type*, is an
    exclusive lock (F_WRLCK), and /fildes/ is not a valid file
    descriptor open for writing. 

Posix specs also forces write permissions for lockf:
http://www.opengroup.org/onlinepubs/007908799/xsh/lockf.html



This leads to solaris not being able to lock directly on a directory, 
I'm afraid. Any idea?


Best regards.
--
Manuel.



Re: QPID-1148, r671604, lockf, flock and fcntl

Posted by Alan Conway <ac...@redhat.com>.
On Fri, 2008-06-27 at 17:14 +0200, Manuel Teira wrote:
> Alan Conway escribió:
> > On Thu, 2008-06-26 at 12:12 +0200, Manuel Teira wrote:
> >   
> >> Hello.
> >> After further  investigation and tests, related with the change in
> >> r671604 to drop the file locking strategy in favour of a flock on the
> >> data dir.
> >>
> >> Trying to write a similar code, but using lockf, I hit the issue that
> >> the file must be opened using O_RDWR or O_RWONLY, and that's not allowed
> >> for a directory.
> >> The same happens trying to use a fcntl call.
> >> And unexpectedly, the same for flock. In the solaris manual page:
> >>
> >> <snip>
> >>      Read permission is required on a file  to  obtain  a  shared
> >>      lock,   and  write  permission  is  required  to  obtain  an
> >>      exclusive lock.
> >> </snip>
> >>
> >> But the linux man page claims:
> >>
> >> <snip>
> >> A shared or exclusive lock can be placed on a file regardless of the
> >> mode in which the file was opened.
> >> </snip>
> >>
> >> I've searched the web for some BSD system pages, but they don't say
> >> anything about the file mode.
> >>
> >>
> >> On the other way, POSIX fcntl specification says, apropos the failure
> >> causes:
> >>
> >> [EBADF]
> >>     The /fildes/ argument is not a valid open file descriptor, or the
> >>     argument /cmd/ is F_SETLK or F_SETLKW, the type of lock, *l_type*,
> >>     is a shared lock (F_RDLCK), and /fildes/ is not a valid file
> >>     descriptor open for reading, or the type of lock *l_type*, is an
> >>     exclusive lock (F_WRLCK), and /fildes/ is not a valid file
> >>     descriptor open for writing.
> >>
> >> Posix specs also forces write permissions for lockf:
> >> http://www.opengroup.org/onlinepubs/007908799/xsh/lockf.html
> >>
> >>
> >>
> >> This leads to solaris not being able to lock directly on a directory,
> >> I'm afraid. Any idea?
> >>     
> >
> >
> > Yes, we can create (if it doesn't already exist) a lock file in the
> > directory and then use lockf to lock it. There's already code in
> > Daemon.cpp that does exactly this for the PID file. The reason I
> > switched to flock was because crashing or killed brokers were sometimes
> > leaving the lock file behind them, whereas a flock (or lockf)  lock is
> > automatically released when the process exits.
> >
> > We need to
> >  - create a qpid::sys::LockFile class that can be re-implemented on
> > different platforms.
> >  - use the Daemon.cpp code as the posix implementation.
> >  - Replace the locking code in Daemon.cpp and DataDir.cpp with the
> > common sys::LockFile.
> >
> > It's JIRA https://issues.apache.org/jira/browse/QPID-1158
> > Could you take this on Manuel? I'll can do it but it may take a couple
> > days to get to it.
> >   
> Of course, I will try (will try to start on monday). By the moment I've 
> reverted changes to keep using the old DataDir.cpp code. I was able to 
> pass most of the tests on solaris (more changes about bashisms needed, 
> though), I will have to take a look about some random message, but this 
> is a dump of a 'make check' session now:
> 
> -bash-3.00$ make check
> make  libshlibtest.la libdlclose_noop.la unit_test  perftest  txtest 
> latencytest client_test  topic_listener topic_publisher  publish consume
> `libshlibtest.la' is up to date.
> `libdlclose_noop.la' is up to date.
> `unit_test' is up to date.
> `perftest' is up to date.
> `txtest' is up to date.
> `latencytest' is up to date.
> `client_test' is up to date.
> `topic_listener' is up to date.
> `topic_publisher' is up to date.
> `publish' is up to date.
> `consume' is up to date.
> make  check-TESTS
> Running 154 test cases...
> 2008-jun-27 17:09:18 error Exception in client dispatch thread: 
> Connection closed by broker
> 
> *** No errors detected
> PASS: unit_test
> PASS: start_broker
> PASS: client_test
> SubscribeThread exception: Sequence error: expected  n==1 but got 0 
> (perftest.cpp:524)
> FAIL: quick_perftest
> PASS: quick_topictest
> sh: objdump: not found
> test_example (tests_0-10.example.ExampleTest) ... ok
> test_auto_rollback (tests_0-10.tx.TxTests) ... ok
> test_commit (tests_0-10.tx.TxTests) ... ok
> test_rollback (tests_0-10.tx.TxTests) ... ok
> test_broker_connectivity (tests_0-10.management.ManagementTest) ... ok
> test_self_session_id (tests_0-10.management.ManagementTest) ... ok
> test_standard_exchanges (tests_0-10.management.ManagementTest) ... ok
> test_system_object (tests_0-10.management.ManagementTest) ... ok
> test_bad_resume (tests_0-10.dtx.DtxTests) ... ok
> test_commit_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_end (tests_0-10.dtx.DtxTests) ... ok
> test_end_suspend_and_fail (tests_0-10.dtx.DtxTests) ... ok
> test_end_unknown_xid (tests_0-10.dtx.DtxTests) ... ok
> test_forget_xid_on_completion (tests_0-10.dtx.DtxTests) ... ok
> test_get_timeout (tests_0-10.dtx.DtxTests) ... ok
> test_get_timeout_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_implicit_end (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_commit_not_ended (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_commit_one_phase_false (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_commit_one_phase_true (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_prepare_not_ended (tests_0-10.dtx.DtxTests) ... ok
> test_invalid_rollback_not_ended (tests_0-10.dtx.DtxTests) ... ok
> test_prepare_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_recover (tests_0-10.dtx.DtxTests) ... ok
> test_rollback_unknown (tests_0-10.dtx.DtxTests) ... ok
> test_select_required (tests_0-10.dtx.DtxTests) ... ok
> test_set_timeout (tests_0-10.dtx.DtxTests) ... ok
> test_simple_commit (tests_0-10.dtx.DtxTests) ... ok
> test_simple_prepare_commit (tests_0-10.dtx.DtxTests) ... ok
> test_simple_prepare_rollback (tests_0-10.dtx.DtxTests) ... ok
> test_simple_rollback (tests_0-10.dtx.DtxTests) ... ok
> test_start_already_known (tests_0-10.dtx.DtxTests) ... ok
> test_start_join (tests_0-10.dtx.DtxTests) ... ok
> test_start_join_and_resume (tests_0-10.dtx.DtxTests) ... ok
> test_suspend_resume (tests_0-10.dtx.DtxTests) ... ok
> test_suspend_start_end_resume (tests_0-10.dtx.DtxTests) ... ok
> test_delete_while_used_by_exchange 
> (tests_0-10.alternate_exchange.AlternateExchangeTests) ... ok
> test_delete_while_used_by_queue 
> (tests_0-10.alternate_exchange.AlternateExchangeTests) ... ok
> test_queue_delete (tests_0-10.alternate_exchange.AlternateExchangeTests) 
> ... ok
> test_unroutable (tests_0-10.alternate_exchange.AlternateExchangeTests) 
> ... ok
> test (tests_0-10.exchange.DeclareMethodPassiveFieldNotFoundRuleTests) ... ok
> testDefaultExchange (tests_0-10.exchange.DefaultExchangeRuleTests) ... ok
> testHeadersBindNoMatchArg (tests_0-10.exchange.ExchangeTests) ... ok
> testMatchAll (tests_0-10.exchange.HeadersExchangeTests) ... ok
> testMatchAny (tests_0-10.exchange.HeadersExchangeTests) ... ok
> testDifferentDeclaredType (tests_0-10.exchange.MiscellaneousErrorsTests) 
> ... ok
> testTypeNotKnown (tests_0-10.exchange.MiscellaneousErrorsTests) ... ok
> testDirect (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testFanout (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testHeaders (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testTopic (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
> testAmqDirect (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> testAmqFanOut (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> testAmqMatch (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> testAmqTopic (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
> test_ack_and_no_ack (tests_0-10.broker.BrokerTests) ... ok
> test_simple_delivery_immediate (tests_0-10.broker.BrokerTests) ... ok
> test_simple_delivery_queued (tests_0-10.broker.BrokerTests) ... ok
> test_ack (tests_0-10.message.MessageTests) ... ok
> test_acquire (tests_0-10.message.MessageTests) ... ok
> test_acquire_with_no_accept_and_credit_flow 
> (tests_0-10.message.MessageTests) ... ok
> test_cancel (tests_0-10.message.MessageTests) ... ok
> test_consume_exclusive (tests_0-10.message.MessageTests) ... ok
> test_consume_exclusive2 (tests_0-10.message.MessageTests) ... ok
> test_consume_queue_not_found (tests_0-10.message.MessageTests) ... ok
> test_consume_queue_not_specified (tests_0-10.message.MessageTests) ... ok
> test_consume_unique_consumers (tests_0-10.message.MessageTests) ... ok
> test_credit_flow_bytes (tests_0-10.message.MessageTests) ... ok
> test_credit_flow_messages (tests_0-10.message.MessageTests) ... ok
> test_empty_body (tests_0-10.message.MessageTests) ... ok
> test_incoming_start (tests_0-10.message.MessageTests) ... ok
> test_no_local (tests_0-10.message.MessageTests) ... ok
> test_no_local_awkward (tests_0-10.message.MessageTests) ... ok
> test_no_local_exclusive_subscribe (tests_0-10.message.MessageTests) ... ok
> test_ranged_ack (tests_0-10.message.MessageTests) ... ok
> test_reject (tests_0-10.message.MessageTests) ... ok
> test_release (tests_0-10.message.MessageTests) ... ok
> test_release_ordering (tests_0-10.message.MessageTests) ... ok
> test_release_unacquired (tests_0-10.message.MessageTests) ... ok
> test_subscribe_not_acquired (tests_0-10.message.MessageTests) ... ok
> test_subscribe_not_acquired_2 (tests_0-10.message.MessageTests) ... ok
> test_subscribe_not_acquired_3 (tests_0-10.message.MessageTests) ... ok
> test_window_flow_bytes (tests_0-10.message.MessageTests) ... ok
> test_window_flow_messages (tests_0-10.message.MessageTests) ... ok
> test_ack_message_from_deleted_queue 
> (tests_0-10.persistence.PersistenceTests) ... ok
> test_delete_queue_after_publish 
> (tests_0-10.persistence.PersistenceTests) ... ok
> test_queue_deletion (tests_0-10.persistence.PersistenceTests) ... ok
> test_autodelete_shared (tests_0-10.queue.QueueTests) ... ok
> test_bind (tests_0-10.queue.QueueTests) ... ok
> test_bind_queue_existence (tests_0-10.queue.QueueTests) ... ok
> test_declare_exclusive (tests_0-10.queue.QueueTests) ... ok
> test_declare_passive (tests_0-10.queue.QueueTests) ... ok
> test_delete_ifempty (tests_0-10.queue.QueueTests) ... ok
> test_delete_ifunused (tests_0-10.queue.QueueTests) ... ok
> test_delete_queue_exists (tests_0-10.queue.QueueTests) ... ok
> test_delete_simple (tests_0-10.queue.QueueTests) ... ok
> test_purge (tests_0-10.queue.QueueTests) ... ok
> test_purge_empty_name (tests_0-10.queue.QueueTests) ... ok
> test_purge_queue_exists (tests_0-10.queue.QueueTests) ... ok
> test_unbind_direct (tests_0-10.queue.QueueTests) ... ok
> test_unbind_fanout (tests_0-10.queue.QueueTests) ... ok
> test_unbind_headers (tests_0-10.queue.QueueTests) ... ok
> test_unbind_topic (tests_0-10.queue.QueueTests) ... ok
> test_exchange_bound_direct (tests_0-10.query.QueryTests) ... ok
> test_exchange_bound_fanout (tests_0-10.query.QueryTests) ... ok
> test_exchange_bound_header (tests_0-10.query.QueryTests) ... ok
> test_exchange_bound_topic (tests_0-10.query.QueryTests) ... ok
> test_exchange_query (tests_0-10.query.QueryTests) ... ok
> test_queue_query (tests_0-10.query.QueryTests) ... ok
> test_queue_query_unknown (tests_0-10.query.QueryTests) ... ok
> 
> ----------------------------------------------------------------------
> Ran 110 tests in 88.510s
> 
> OK
> PASS: python_tests
> PASS: stop_broker
> Running federation tests using brokers on ports 45428 45429
> sh: objdump: not found
> test_bridge_create_and_close (federation.FederationTests) ... ok
> test_pull_from_exchange (federation.FederationTests) ... ok
> test_pull_from_queue (federation.FederationTests) ... ok
> test_tracing (federation.FederationTests) ... ok
> 
> ----------------------------------------------------------------------
> Ran 4 tests in 48.880s
> 
> OK
> PASS: run_federation_tests
> ==============================================
> 1 of 8 tests failed
> Please report to qpid-dev@incubator.apache.org
> ==============================================
> 
> 
> 
> 
> Only a test is failing. There's also a weird message during unit_test 
> (Exception in client dispatch thread: Connection closed by broker), and

That is not an error, its comming from a test that deliberately provokes
various error conditions. It's being printed because the broker logs
errors on stderr by default. I can fix the tests to hide this message,
thanks for reminding me.

>  
> also those "sh: objdump not found" messages I'm still not sure where 
> they're coming from, since at a first look I was not able to find any 
> objdump invocation. Other than that, it gives me hope about having a 
> solaris working version soon.

It looks fantastic, definitely ready for a test drive on Linux. Will try
to do this next week.



Re: QPID-1148, r671604, lockf, flock and fcntl

Posted by Manuel Teira <mt...@tid.es>.
Alan Conway escribió:
> On Thu, 2008-06-26 at 12:12 +0200, Manuel Teira wrote:
>   
>> Hello.
>> After further  investigation and tests, related with the change in
>> r671604 to drop the file locking strategy in favour of a flock on the
>> data dir.
>>
>> Trying to write a similar code, but using lockf, I hit the issue that
>> the file must be opened using O_RDWR or O_RWONLY, and that's not allowed
>> for a directory.
>> The same happens trying to use a fcntl call.
>> And unexpectedly, the same for flock. In the solaris manual page:
>>
>> <snip>
>>      Read permission is required on a file  to  obtain  a  shared
>>      lock,   and  write  permission  is  required  to  obtain  an
>>      exclusive lock.
>> </snip>
>>
>> But the linux man page claims:
>>
>> <snip>
>> A shared or exclusive lock can be placed on a file regardless of the
>> mode in which the file was opened.
>> </snip>
>>
>> I've searched the web for some BSD system pages, but they don't say
>> anything about the file mode.
>>
>>
>> On the other way, POSIX fcntl specification says, apropos the failure
>> causes:
>>
>> [EBADF]
>>     The /fildes/ argument is not a valid open file descriptor, or the
>>     argument /cmd/ is F_SETLK or F_SETLKW, the type of lock, *l_type*,
>>     is a shared lock (F_RDLCK), and /fildes/ is not a valid file
>>     descriptor open for reading, or the type of lock *l_type*, is an
>>     exclusive lock (F_WRLCK), and /fildes/ is not a valid file
>>     descriptor open for writing.
>>
>> Posix specs also forces write permissions for lockf:
>> http://www.opengroup.org/onlinepubs/007908799/xsh/lockf.html
>>
>>
>>
>> This leads to solaris not being able to lock directly on a directory,
>> I'm afraid. Any idea?
>>     
>
>
> Yes, we can create (if it doesn't already exist) a lock file in the
> directory and then use lockf to lock it. There's already code in
> Daemon.cpp that does exactly this for the PID file. The reason I
> switched to flock was because crashing or killed brokers were sometimes
> leaving the lock file behind them, whereas a flock (or lockf)  lock is
> automatically released when the process exits.
>
> We need to
>  - create a qpid::sys::LockFile class that can be re-implemented on
> different platforms.
>  - use the Daemon.cpp code as the posix implementation.
>  - Replace the locking code in Daemon.cpp and DataDir.cpp with the
> common sys::LockFile.
>
> It's JIRA https://issues.apache.org/jira/browse/QPID-1158
> Could you take this on Manuel? I'll can do it but it may take a couple
> days to get to it.
>   
Of course, I will try (will try to start on monday). By the moment I've 
reverted changes to keep using the old DataDir.cpp code. I was able to 
pass most of the tests on solaris (more changes about bashisms needed, 
though), I will have to take a look about some random message, but this 
is a dump of a 'make check' session now:

-bash-3.00$ make check
make  libshlibtest.la libdlclose_noop.la unit_test  perftest  txtest 
latencytest client_test  topic_listener topic_publisher  publish consume
`libshlibtest.la' is up to date.
`libdlclose_noop.la' is up to date.
`unit_test' is up to date.
`perftest' is up to date.
`txtest' is up to date.
`latencytest' is up to date.
`client_test' is up to date.
`topic_listener' is up to date.
`topic_publisher' is up to date.
`publish' is up to date.
`consume' is up to date.
make  check-TESTS
Running 154 test cases...
2008-jun-27 17:09:18 error Exception in client dispatch thread: 
Connection closed by broker

*** No errors detected
PASS: unit_test
PASS: start_broker
PASS: client_test
SubscribeThread exception: Sequence error: expected  n==1 but got 0 
(perftest.cpp:524)
FAIL: quick_perftest
PASS: quick_topictest
sh: objdump: not found
test_example (tests_0-10.example.ExampleTest) ... ok
test_auto_rollback (tests_0-10.tx.TxTests) ... ok
test_commit (tests_0-10.tx.TxTests) ... ok
test_rollback (tests_0-10.tx.TxTests) ... ok
test_broker_connectivity (tests_0-10.management.ManagementTest) ... ok
test_self_session_id (tests_0-10.management.ManagementTest) ... ok
test_standard_exchanges (tests_0-10.management.ManagementTest) ... ok
test_system_object (tests_0-10.management.ManagementTest) ... ok
test_bad_resume (tests_0-10.dtx.DtxTests) ... ok
test_commit_unknown (tests_0-10.dtx.DtxTests) ... ok
test_end (tests_0-10.dtx.DtxTests) ... ok
test_end_suspend_and_fail (tests_0-10.dtx.DtxTests) ... ok
test_end_unknown_xid (tests_0-10.dtx.DtxTests) ... ok
test_forget_xid_on_completion (tests_0-10.dtx.DtxTests) ... ok
test_get_timeout (tests_0-10.dtx.DtxTests) ... ok
test_get_timeout_unknown (tests_0-10.dtx.DtxTests) ... ok
test_implicit_end (tests_0-10.dtx.DtxTests) ... ok
test_invalid_commit_not_ended (tests_0-10.dtx.DtxTests) ... ok
test_invalid_commit_one_phase_false (tests_0-10.dtx.DtxTests) ... ok
test_invalid_commit_one_phase_true (tests_0-10.dtx.DtxTests) ... ok
test_invalid_prepare_not_ended (tests_0-10.dtx.DtxTests) ... ok
test_invalid_rollback_not_ended (tests_0-10.dtx.DtxTests) ... ok
test_prepare_unknown (tests_0-10.dtx.DtxTests) ... ok
test_recover (tests_0-10.dtx.DtxTests) ... ok
test_rollback_unknown (tests_0-10.dtx.DtxTests) ... ok
test_select_required (tests_0-10.dtx.DtxTests) ... ok
test_set_timeout (tests_0-10.dtx.DtxTests) ... ok
test_simple_commit (tests_0-10.dtx.DtxTests) ... ok
test_simple_prepare_commit (tests_0-10.dtx.DtxTests) ... ok
test_simple_prepare_rollback (tests_0-10.dtx.DtxTests) ... ok
test_simple_rollback (tests_0-10.dtx.DtxTests) ... ok
test_start_already_known (tests_0-10.dtx.DtxTests) ... ok
test_start_join (tests_0-10.dtx.DtxTests) ... ok
test_start_join_and_resume (tests_0-10.dtx.DtxTests) ... ok
test_suspend_resume (tests_0-10.dtx.DtxTests) ... ok
test_suspend_start_end_resume (tests_0-10.dtx.DtxTests) ... ok
test_delete_while_used_by_exchange 
(tests_0-10.alternate_exchange.AlternateExchangeTests) ... ok
test_delete_while_used_by_queue 
(tests_0-10.alternate_exchange.AlternateExchangeTests) ... ok
test_queue_delete (tests_0-10.alternate_exchange.AlternateExchangeTests) 
... ok
test_unroutable (tests_0-10.alternate_exchange.AlternateExchangeTests) 
... ok
test (tests_0-10.exchange.DeclareMethodPassiveFieldNotFoundRuleTests) ... ok
testDefaultExchange (tests_0-10.exchange.DefaultExchangeRuleTests) ... ok
testHeadersBindNoMatchArg (tests_0-10.exchange.ExchangeTests) ... ok
testMatchAll (tests_0-10.exchange.HeadersExchangeTests) ... ok
testMatchAny (tests_0-10.exchange.HeadersExchangeTests) ... ok
testDifferentDeclaredType (tests_0-10.exchange.MiscellaneousErrorsTests) 
... ok
testTypeNotKnown (tests_0-10.exchange.MiscellaneousErrorsTests) ... ok
testDirect (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
testFanout (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
testHeaders (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
testTopic (tests_0-10.exchange.RecommendedTypesRuleTests) ... ok
testAmqDirect (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
testAmqFanOut (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
testAmqMatch (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
testAmqTopic (tests_0-10.exchange.RequiredInstancesRuleTests) ... ok
test_ack_and_no_ack (tests_0-10.broker.BrokerTests) ... ok
test_simple_delivery_immediate (tests_0-10.broker.BrokerTests) ... ok
test_simple_delivery_queued (tests_0-10.broker.BrokerTests) ... ok
test_ack (tests_0-10.message.MessageTests) ... ok
test_acquire (tests_0-10.message.MessageTests) ... ok
test_acquire_with_no_accept_and_credit_flow 
(tests_0-10.message.MessageTests) ... ok
test_cancel (tests_0-10.message.MessageTests) ... ok
test_consume_exclusive (tests_0-10.message.MessageTests) ... ok
test_consume_exclusive2 (tests_0-10.message.MessageTests) ... ok
test_consume_queue_not_found (tests_0-10.message.MessageTests) ... ok
test_consume_queue_not_specified (tests_0-10.message.MessageTests) ... ok
test_consume_unique_consumers (tests_0-10.message.MessageTests) ... ok
test_credit_flow_bytes (tests_0-10.message.MessageTests) ... ok
test_credit_flow_messages (tests_0-10.message.MessageTests) ... ok
test_empty_body (tests_0-10.message.MessageTests) ... ok
test_incoming_start (tests_0-10.message.MessageTests) ... ok
test_no_local (tests_0-10.message.MessageTests) ... ok
test_no_local_awkward (tests_0-10.message.MessageTests) ... ok
test_no_local_exclusive_subscribe (tests_0-10.message.MessageTests) ... ok
test_ranged_ack (tests_0-10.message.MessageTests) ... ok
test_reject (tests_0-10.message.MessageTests) ... ok
test_release (tests_0-10.message.MessageTests) ... ok
test_release_ordering (tests_0-10.message.MessageTests) ... ok
test_release_unacquired (tests_0-10.message.MessageTests) ... ok
test_subscribe_not_acquired (tests_0-10.message.MessageTests) ... ok
test_subscribe_not_acquired_2 (tests_0-10.message.MessageTests) ... ok
test_subscribe_not_acquired_3 (tests_0-10.message.MessageTests) ... ok
test_window_flow_bytes (tests_0-10.message.MessageTests) ... ok
test_window_flow_messages (tests_0-10.message.MessageTests) ... ok
test_ack_message_from_deleted_queue 
(tests_0-10.persistence.PersistenceTests) ... ok
test_delete_queue_after_publish 
(tests_0-10.persistence.PersistenceTests) ... ok
test_queue_deletion (tests_0-10.persistence.PersistenceTests) ... ok
test_autodelete_shared (tests_0-10.queue.QueueTests) ... ok
test_bind (tests_0-10.queue.QueueTests) ... ok
test_bind_queue_existence (tests_0-10.queue.QueueTests) ... ok
test_declare_exclusive (tests_0-10.queue.QueueTests) ... ok
test_declare_passive (tests_0-10.queue.QueueTests) ... ok
test_delete_ifempty (tests_0-10.queue.QueueTests) ... ok
test_delete_ifunused (tests_0-10.queue.QueueTests) ... ok
test_delete_queue_exists (tests_0-10.queue.QueueTests) ... ok
test_delete_simple (tests_0-10.queue.QueueTests) ... ok
test_purge (tests_0-10.queue.QueueTests) ... ok
test_purge_empty_name (tests_0-10.queue.QueueTests) ... ok
test_purge_queue_exists (tests_0-10.queue.QueueTests) ... ok
test_unbind_direct (tests_0-10.queue.QueueTests) ... ok
test_unbind_fanout (tests_0-10.queue.QueueTests) ... ok
test_unbind_headers (tests_0-10.queue.QueueTests) ... ok
test_unbind_topic (tests_0-10.queue.QueueTests) ... ok
test_exchange_bound_direct (tests_0-10.query.QueryTests) ... ok
test_exchange_bound_fanout (tests_0-10.query.QueryTests) ... ok
test_exchange_bound_header (tests_0-10.query.QueryTests) ... ok
test_exchange_bound_topic (tests_0-10.query.QueryTests) ... ok
test_exchange_query (tests_0-10.query.QueryTests) ... ok
test_queue_query (tests_0-10.query.QueryTests) ... ok
test_queue_query_unknown (tests_0-10.query.QueryTests) ... ok

----------------------------------------------------------------------
Ran 110 tests in 88.510s

OK
PASS: python_tests
PASS: stop_broker
Running federation tests using brokers on ports 45428 45429
sh: objdump: not found
test_bridge_create_and_close (federation.FederationTests) ... ok
test_pull_from_exchange (federation.FederationTests) ... ok
test_pull_from_queue (federation.FederationTests) ... ok
test_tracing (federation.FederationTests) ... ok

----------------------------------------------------------------------
Ran 4 tests in 48.880s

OK
PASS: run_federation_tests
==============================================
1 of 8 tests failed
Please report to qpid-dev@incubator.apache.org
==============================================




Only a test is failing. There's also a weird message during unit_test 
(Exception in client dispatch thread: Connection closed by broker), and 
also those "sh: objdump not found" messages I'm still not sure where 
they're coming from, since at a first look I was not able to find any 
objdump invocation. Other than that, it gives me hope about having a 
solaris working version soon.

Thanks for all your support!

Best regards and happy weekend.
--
Manuel.



>
> .
>
>   


Re: QPID-1148, r671604, lockf, flock and fcntl

Posted by Alan Conway <ac...@redhat.com>.
On Thu, 2008-06-26 at 12:12 +0200, Manuel Teira wrote:
> Hello.
> After further  investigation and tests, related with the change in 
> r671604 to drop the file locking strategy in favour of a flock on the 
> data dir.
> 
> Trying to write a similar code, but using lockf, I hit the issue that 
> the file must be opened using O_RDWR or O_RWONLY, and that's not allowed 
> for a directory.
> The same happens trying to use a fcntl call.
> And unexpectedly, the same for flock. In the solaris manual page:
> 
> <snip>
>      Read permission is required on a file  to  obtain  a  shared
>      lock,   and  write  permission  is  required  to  obtain  an
>      exclusive lock.
> </snip>
> 
> But the linux man page claims:
> 
> <snip>
> A shared or exclusive lock can be placed on a file regardless of the 
> mode in which the file was opened.
> </snip>
> 
> I've searched the web for some BSD system pages, but they don't say 
> anything about the file mode.
> 
> 
> On the other way, POSIX fcntl specification says, apropos the failure 
> causes:
> 
> [EBADF]
>     The /fildes/ argument is not a valid open file descriptor, or the
>     argument /cmd/ is F_SETLK or F_SETLKW, the type of lock, *l_type*,
>     is a shared lock (F_RDLCK), and /fildes/ is not a valid file
>     descriptor open for reading, or the type of lock *l_type*, is an
>     exclusive lock (F_WRLCK), and /fildes/ is not a valid file
>     descriptor open for writing. 
> 
> Posix specs also forces write permissions for lockf:
> http://www.opengroup.org/onlinepubs/007908799/xsh/lockf.html
> 
> 
> 
> This leads to solaris not being able to lock directly on a directory, 
> I'm afraid. Any idea?


Yes, we can create (if it doesn't already exist) a lock file in the
directory and then use lockf to lock it. There's already code in
Daemon.cpp that does exactly this for the PID file. The reason I
switched to flock was because crashing or killed brokers were sometimes
leaving the lock file behind them, whereas a flock (or lockf)  lock is
automatically released when the process exits. 

We need to
 - create a qpid::sys::LockFile class that can be re-implemented on
different platforms.
 - use the Daemon.cpp code as the posix implementation.
 - Replace the locking code in Daemon.cpp and DataDir.cpp with the
common sys::LockFile.

It's JIRA https://issues.apache.org/jira/browse/QPID-1158
Could you take this on Manuel? I'll can do it but it may take a couple
days to get to it.