You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by David Summers <da...@summersoft.fay.ar.us> on 2003/02/19 04:36:42 UTC

Still hang on svn 4951 RedHat 7.3 SMP

Well, I got the RedHat 8.0 to compile with svn 4951 but I am still having 
the RedHat 7.3 hang on the externals_tests.py:

I've got an "lt-svn up" command running, and two lt-svnserve 
processes; here are the gdb traces:

lt-svn up:
==========
#0  0x420e0187 in poll () from /lib/i686/libc.so.6
#1  0x4017ba6c in apr_poll (aprset=0xbfffdfe0, num=1, nsds=0xbfffdfdc,
    timeout=-1) at poll.c:168
#2  0x4017c0c3 in apr_wait_for_io_or_timeout (f=0x0, s=0x8071d00, 
for_read=1)
    at waitio.c:92
#3  0x40172f39 in apr_socket_recv (sock=0x8071d00,
    buf=0x8089858 "( success ( ) ) ) ) ( ) ) ", len=0xbfffe0b8)
    at sendrecv.c:125
#4  0x4017362c in apr_recv (sock=0x8071d00,
    buf=0x8089858 "( success ( ) ) ) ) ( ) ) ", len=0xbfffe0b8)
    at sendrecv.c:1058
#5  0x4012b132 in readbuf_input (conn=0x8089848,
    data=0x8089858 "( success ( ) ) ) ) ( ) ) ", len=0xbfffe0b8)
    at subversion/libsvn_ra_svn/marshal.c:161
#6  0x4012b221 in readbuf_fill (conn=0x8089848)
    at subversion/libsvn_ra_svn/marshal.c:180
#7  0x4012b267 in readbuf_getchar (conn=0x8089848,
    result=0xbfffe12b "¿\210Ò\022@ Ï\005\b\200¼\b\bháÿ¿\211¾\022@H\230\b\b 
Ï\005\b\\áÿ¿DÅ\022@TË\022@VË\022@xáÿ¿Ø¼\b\b\210Ò\022@ 
Ï\005\bØáÿ¿\026¿\022@H\230\b\b
Ï\005\bDÅ\022@Àáÿ¿Äáÿ¿TË\022@\002") at 
subversion/libsvn_ra_svn/marshal.c:189
#8  0x4012b29e in readbuf_getchar_skip_whitespace (conn=0x8089848,
    result=0xbfffe12b "¿\210Ò\022@ Ï\005\b\200¼\b\bháÿ¿\211¾\022@H\230\b\b 
Ï\005\b\\áÿ¿DÅ\022@TË\022@VË\022@xáÿ¿Ø¼\b\b\210Ò\022@ 
Ï\005\bØáÿ¿\026¿\022@H\230\b\b
#9  0x4012b9ee in svn_ra_svn_read_item (conn=0x8089848, pool=0x805cfa0,
    item=0xbfffe15c) at subversion/libsvn_ra_svn/marshal.c:446
#10 0x4012be89 in svn_ra_svn_read_tuple (conn=0x8089848, pool=0x805cfa0,
    fmt=0x4012c544 "wl") at subversion/libsvn_ra_svn/marshal.c:557
#11 0x4012bf16 in svn_ra_svn_read_cmd_response (conn=0x8089848,
    pool=0x805cfa0, fmt=0x4012c4c2 "")
    at subversion/libsvn_ra_svn/marshal.c:581
#12 0x40128271 in ra_svn_set_path (baton=0x808bc80, path=0x4003fcfb "", 
rev=5,
    pool=0x805cfa0) at subversion/libsvn_ra_svn/client.c:169
#13 0x4002ce56 in svn_wc_crawl_revisions (path=0x8089538 "exdir_G",
    adm_access=0x8070d80, reporter=0x4012d020, report_baton=0x808bc80,
    restore_files=1, recurse=1, notify_func=0x804bc08 <notify>,
    notify_baton=0x805dca8, traversal_info=0x8089540, pool=0x805cfa0)
    at subversion/libsvn_wc/adm_crawler.c:383
#14 0x40024d05 in svn_client__update_internal (path=0x8089538 "exdir_G",
    revision=0x80893a0, recurse=1, timestamp_sleep=0xbfffe568, 
ctx=0xbfffe790,
    pool=0x805cfa0) at subversion/libsvn_client/update.c:138
#15 0x40020c81 in handle_external_item_change (key=0x8088fc8, klen=7,
    status=svn_hash_diff_key_both, baton=0xbfffe410)
    at subversion/libsvn_client/externals.c:456
#16 0x401375cf in svn_hash_diff (hash_a=0x8088e28, hash_b=0x80891a8,
    diff_func=0x40020a4c <handle_external_item_change>,
    diff_func_baton=0xbfffe410, pool=0x805cfa0)
    at subversion/libsvn_subr/hash.c:300
#17 0x40020db8 in handle_externals_desc_change (key=0x8088960, klen=0,
    status=svn_hash_diff_key_both, baton=0xbfffe4c0)
    at subversion/libsvn_client/externals.c:545
#18 0x401375cf in svn_hash_diff (hash_a=0x805dcd0, hash_b=0x805dd30,
    diff_func=0x40020cf4 <handle_externals_desc_change>,
    diff_func_baton=0xbfffe4c0, pool=0x805cfa0)
    at subversion/libsvn_subr/hash.c:300
#19 0x40020e2f in svn_client__handle_externals (traversal_info=0x805dcc0,
    update_unchanged=1, timestamp_sleep=0xbfffe568, ctx=0xbfffe790,
    pool=0x805cfa0) at subversion/libsvn_client/externals.c:571
#20 0x40024d4f in svn_client__update_internal (path=0x805dc80 "",
    revision=0xbfffe7b0, recurse=1, timestamp_sleep=0x0, ctx=0xbfffe790,
    pool=0x805cfa0) at subversion/libsvn_client/update.c:160
#21 0x40024dae in svn_client_update (path=0x805dc80 "", 
revision=0xbfffe7b0,
    recurse=1, ctx=0xbfffe790, pool=0x805cfa0)
    at subversion/libsvn_client/update.c:181
#22 0x0804fc62 in svn_cl__update (os=0x805d0c0, baton=0xbfffe630,
    pool=0x805cfa0) at subversion/clients/cmdline/update-cmd.c:70
#23 0x0804dc1e in main (argc=2, argv=0xbfffe8c4)
    at subversion/clients/cmdline/main.c:994
#24 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6




lt-svnserve #1:
===============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400cc938 in __txn_regop_log () from /usr/lib/libdb-4.0.so
#7  0x400cb742 in __txn_commit () from /usr/lib/libdb-4.0.so
#8  0x400339f7 in commit_trail (trail=0x808c330, fs=0x807e9a0)
    at subversion/libsvn_fs/trail.c:100
#9  0x40033aad in svn_fs__retry_txn (fs=0x807e9a0,
    txn_body=0x400337ec <txn_body_change_txn_prop>, baton=0xbfffe7f0,
    pool=0x808c0c8) at subversion/libsvn_fs/trail.c:136
#10 0x40033882 in svn_fs_change_txn_prop (txn=0x80943f0,
    name=0x4003b2b9 "svn:date", value=0xbfffe830, pool=0x808c0c8)
    at subversion/libsvn_fs/revs-txns.c:567
#11 0x40037ac6 in svn_fs_begin_txn (txn_p=0x808c294, fs=0x807e9a0, rev=5,
    pool=0x808c0c8) at subversion/libsvn_fs/txn.c:149
#12 0x4001b7d8 in svn_repos_fs_begin_txn_for_update (txn_p=0x808c294,
    repos=0x8068808, rev=5, author=0x808c2d0 "anonymous", pool=0x808c0c8)
    at subversion/libsvn_repos/fs-wrap.c:127
#13 0x4001dd63 in svn_repos_set_path (report_baton=0x808c290,
    path=0x808e1b8 "", revision=5, pool=0x808e0d0)
    at subversion/libsvn_repos/reporter.c:173
#14 0x0804a89a in set_path (conn=0x8065898, pool=0x808e0d0, 
params=0x808e190,
    baton=0xbfffe978) at subversion/svnserve/serve.c:96
#15 0x4010015e in svn_ra_svn_handle_commands (conn=0x8065898, 
pool=0x808c0c8,
    commands=0x804cda4, baton=0xbfffe978, pass_through_errors=0)
    at subversion/libsvn_ra_svn/marshal.c:637
#16 0x0804ab13 in handle_report (conn=0x8065898, pool=0x808c0c8,
    repos_url=0x8063e78 
"svn://localhost/repositories/externals_tests-7.other",
baton=0x808c290) at subversion/svnserve/serve.c:169
#17 0x0804bba9 in update (conn=0x8065898, pool=0x808c0c8, 
params=0x808c178,
    baton=0xbfffea70) at subversion/svnserve/serve.c:592
#18 0x4010015e in svn_ra_svn_handle_commands (conn=0x8065898, 
pool=0x8063890,
    commands=0x804cdec, baton=0xbfffea70, pass_through_errors=0)
    at subversion/libsvn_ra_svn/marshal.c:637
#19 0x0804c88f in serve (conn=0x8065898,
    root=0x80519f8 
"/home/david/rpms/build/subversion-0.17.1/subversion/tests/clients/cmdline", 
tunnel=0, read_only=0, pool=0x8063890)
    at subversion/svnserve/serve.c:986
#20 0x0804a7ac in main (argc=4, argv=0xbfffeca4)
    at subversion/svnserve/main.c:201
#21 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6


Then lt-svn #2:
===============
#0  0x420e8132 in accept () from /lib/i686/libc.so.6
#1  0x401dc593 in accept () from /lib/i686/libpthread.so.0
#2  0x4014c830 in apr_socket_accept (new=0xbfffeaec, sock=0x8051af8,
    connection_context=0x8063890) at sockets.c:201
#3  0x4014cd48 in apr_accept (new=0xbfffeaec, sock=0x8051af8,
    connection_context=0x8063890) at sockets.c:420
#4  0x0804a696 in main (argc=4, argv=0xbfffeca4)
    at subversion/svnserve/main.c:161
#5  0x42017589 in __libc_start_main () from /lib/i686/libc.so.6


Here is what happens when I look in the log file:
CMD: svnadmin "create" "repositories/externals_tests-6" <TIME = 0.001808>
CMD: svnadmin dump "local_tmp/repos" | svnadmin load 
"repositories/externals_tests-6" <TIME = 0.003394>
CMD: svn "co" "--username" "jrandom" "--password" "rayjandom" 
"svn://localhost/repositories/externals_tests-6" 
"working_copies/externals_tests-6" <TIME = 0.001826>
CMD: svn "checkout" "--username" "jrandom" "--password" "rayjandom" 
"svn://localhost/repositories/externals_tests-6" 
"working_copies/externals_tests-6.init" <TIME = 0.001732>
CMD: svn "ci" "-m" "log msg" "--quiet" 
"working_copies/externals_tests-6.init" <TIME = 0.001788>
CMD: svn "ci" "-m" "log msg" "--quiet" "working_copies/ext


-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Still hang on svn 4951 RedHat 7.3 SMP

Posted by Sander Striker <st...@apache.org>.

> From: mark benedetto king [mailto:mbk@boredom.org]
> Sent: Thursday, February 20, 2003 6:34 PM

> Note that ONLY pid 32439 is accessing the repository.
> 
> This is surprising (to me, at least); I think it means that there is no
> deadlock, and that there must be some sort of mutex leakage/clobbering.

The only times we saw deadlocks in one process was when we were nesting
txns IIRC.  And AFAIK we aren't doing that anymore.  I wonder what's going
on here.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by mark benedetto king <mb...@boredom.org>.

I was a little suspicious of the BDB MUTEX asm, so I switched my
BDB over to fcntl-based mutexes.

I had hoped that this would resolve the problem, but it did not.

However, it did (perhaps) shed some new light on things.

In the past, those who have experienced lockups have seen them
mainly in txn_checkpoint()  (or with at least one process in txn_checkpoint).

IIRC, They have ALWAYS been seen with muliple processes (either hung
httpds or hung svnserves).

With the fcntl change I have a mostly-reproducible hang in a *single*
svnserve (other than the socket-listening-forker).

Here's the backtrace:

#0  0x420e187e in select () from /lib/i686/libc.so.6
#1  0x400ce09c in __DTOR_END__ () from /home/bking/tools/db4/lib/libdb-4.0.so
#2  0x400b047e in __os_yield () from /home/bking/tools/db4/lib/libdb-4.0.so
#3  0x40057097 in __db_fcntl_mutex_lock ()
   from /home/bking/tools/db4/lib/libdb-4.0.so
#4  0x400a6c10 in __log_put_int () from /home/bking/tools/db4/lib/libdb-4.0.so
#5  0x400a6779 in __log_put () from /home/bking/tools/db4/lib/libdb-4.0.so
#6  0x400be788 in __txn_regop_log ()
   from /home/bking/tools/db4/lib/libdb-4.0.so
#7  0x400bd5d6 in __txn_commit () from /home/bking/tools/db4/lib/libdb-4.0.so
#8  0x4003b0c7 in commit_trail (trail=0x8080450, fs=0x8073ad8)
    at subversion/libsvn_fs/trail.c:100
#9  0x4003b1d2 in svn_fs__retry_txn (fs=0x8073ad8, 
    txn_body=0x40040834 <txn_body_begin_txn>, baton=0xbffff700, pool=0x807f9a0)
    at subversion/libsvn_fs/trail.c:136
#10 0x40040938 in svn_fs_begin_txn (txn_p=0x807fe0c, fs=0x8073ad8, rev=4, 
    pool=0x807f9a0) at subversion/libsvn_fs/txn.c:134
#11 0x4001cc57 in svn_repos_fs_begin_txn_for_update (txn_p=0x807fe0c, 
    repos=0x8073888, rev=4, author=0x807fe50 "anonymous", pool=0x807f9a0)
    at subversion/libsvn_repos/fs-wrap.c:127
#12 0x4001ff6b in svn_repos_set_path (report_baton=0x807fe08, 
    path=0x80803d8 "", revision=4, pool=0x8080098)
    at subversion/libsvn_repos/reporter.c:173
#13 0x0804a99c in set_path (conn=0x8060b88, pool=0x8080098, params=0x80803b0, 
    baton=0xbffff840) at subversion/svnserve/serve.c:96
#14 0x40104c2a in svn_ra_svn_handle_commands (conn=0x8060b88, pool=0x807f9a0, 
    commands=0x804dbb4, baton=0xbffff840, pass_through_errors=0)
    at subversion/libsvn_ra_svn/marshal.c:637
#15 0x0804ad07 in handle_report (conn=0x8060b88, pool=0x807f9a0, 
    repos_url=0x807f928 "svn://localhost/repositories/merge_tests-4"ton=0x807fe08) at subversion/svnserve/serve.c:169
#16 0x0804c99c in status (conn=0x8060b88, pool=0x807f9a0, params=0x807fca0, 
    baton=0xbffff900) at subversion/svnserve/serve.c:658
#17 0x40104c2a in svn_ra_svn_handle_commands (conn=0x8060b88, pool=0x8060848, 
    commands=0x804dde4, baton=0xbffff900, pass_through_errors=0)
    at subversion/libsvn_ra_svn/marshal.c:637
#18 0x0804d8fa in serve (conn=0x8060b88, 
    root=0x8060698 "/home/bking/projects/svn/subversion/tests/clients/cmdline", tunnel=0, read_only=0, pool=0x8060848) at subversion/svnserve/serve.c:986
#19 0x0804a86d in main (argc=4, argv=0xbffffb64)
    at subversion/svnserve/main.c:201



Note that it is not in txn_checkpoint, but in txn_commit.

Here's another data-point:

$ /usr/sbin/lsof +D /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/
COMMAND     PID  USER   FD   TYPE DEVICE   SIZE   NODE NAME
lt-svnser 32439 bking  mem    REG    3,4   8192 277545 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.001
lt-svnser 32439 bking  mem    REG    3,4  16384 277550 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.005
lt-svnser 32439 bking  mem    REG    3,4 270336 277546 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.002
lt-svnser 32439 bking  mem    REG    3,4 327680 277547 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.003
lt-svnser 32439 bking  mem    REG    3,4 712704 277549 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.004
lt-svnser 32439 bking    5u   REG    3,4   8192 277545 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/__db.001
lt-svnser 32439 bking    6u   REG    3,4   8192 277697 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/nodes
lt-svnser 32439 bking    7u   REG    3,4   8192 277698 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/revisions
lt-svnser 32439 bking    8u   REG    3,4   8192 277830 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/transactions
lt-svnser 32439 bking    9u   REG    3,4   8192 277831 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/copies
lt-svnser 32439 bking   10u   REG    3,4   8192 277832 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/changes
lt-svnser 32439 bking   11u   REG    3,4   8192 277833 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/representations
lt-svnser 32439 bking   12u   REG    3,4   8192 277834 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/strings
lt-svnser 32439 bking   13u   REG    3,4   8192 277835 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/db/uuids
lt-svnser 32439 bking   14r   REG    3,4    460 277536 /home/bking/projects/svn/subversion/tests/clients/cmdline/repositories/merge_tests-4/locks/db.lock


Note that ONLY pid 32439 is accessing the repository.

This is surprising (to me, at least); I think it means that there is no
deadlock, and that there must be some sort of mutex leakage/clobbering.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by Philip Martin <ph...@codematters.co.uk>.

David Summers <da...@summersoft.fay.ar.us> writes:

> On 19 Feb 2003, Philip Martin wrote:
> > You need to run the test manually to avoid truncating the buffered output.
> > 
> > $ cd subversion/tests/clients/cmdline
> > $ ./externals_tests.py 6 BASE_URL=svn://localhost
> 
> OK, Finally figured out how to run it manually.  I've run it 20-30 times 
> manually and frequently (but not always) it hangs.  Here is the last part 
> of the manual run for this one.  In this particular run, I'm hanging with 
> 8 lt-svnserve processes running (Ack!).

Are some of these lt-svnserve processes left over from previous hangs?
When you rerun the regression tests any old repositories get deleted
and new repositories get created.  I'm not sure what would happen if
you have old lt-svnserve processes accessing the old repositories,
will they interfere with the new ones?  Would you be able to try again
but this time make sure you kill any the old lt-svnserve processes and
start a new svnserve daemon after each hang.

> Here are the traces of the lt-svn and
> the 8 lt-svnserve processes and the results of manually running the 
> externals_tests.py:
> 
> RedHat 7.3 non-SMP
> 
> externals_tests.py output (run manually):
> ==========================================
> CMD: svn "up" "working_copies/externals_tests-2.init" <TIME = 0.002711>

Last time it was test 6, now it's test 2.  I assume this means that it
hangs in lots of different places.

> CMD: svnadmin "create" "repositories/externals_tests-2.other" <TIME = 
> 0.002638>
> CMD: svnadmin dump "repositories/externals_tests-2" | svnadmin load 
> "repositories/externals_tests-2.other" <TIME = 0.009731>
> CMD: svn "pset" "-F" "working_copies/externals_tests-2.init/tmp0mU5ZY" 
> "svn:externals" "working_copies/externals_tests-2.init/A/B" <TIME = 
> 0.002714>
> CMD: svn "pset" "-F" "working_copies/externals_tests-2.init/tmp0mU5ZY" 
> "svn:externals" "working_copies/externals_tests-2.init/A/D" <TIME = 
> 0.002633>
> CMD: svn "ci" "-m" "log msg" "working_copies/externals_tests-2.init" <TIME 
> = 0.002701>
> CMD: svn "status" "-v" "-u" "-q" "working_copies/externals_tests-2.init" 
> <TIME = 0.002605>
> CMD: svn "checkout" "--username" "jrandom" "--password" "rayjandom" 
> "svn://localhost/repositories/externals_tests-2" 
> "working_copies/externals_tests-2" <TIME = 0.002673>
> CMD: svn "checkout" "--username" "jrandom" "--password" "rayjandom" 
> "svn://localhost/repositories/externals_tests-2" 
> "working_copies/externals_tests-2.other" <TIME = 0.006195>
> CMD: svn "pset" "-F" "local_tmp/tmpXoZLqH" "svn:externals" 
> "working_copies/externals_tests-2/A/D" <TIME = 0.002666>
> CMD: svn "ci" "-m" "log msg" "--quiet" 
> "working_copies/externals_tests-2/A/D" <TIME = 0.002444>
> CMD: svn "up" "working_copies/externals_tests-2.other" <TIME = 0.002650>
>  
> lt-svn
> ======
> #14 0x40024d15 in svn_client__update_internal (
>     path=0x8095900 "working_copies/externals_tests-2.other/A/B/exdir_G", 
>     revision=0x8095818, recurse=1, timestamp_sleep=0xbfffeb48, 
> ctx=0xbfffed70, 
>     pool=0x805deb0) at subversion/libsvn_client/update.c:138

This is the first svn:externals directory to get updated after updating
the working_copies/externals_tests-2.other directory.  So this process
has just completed an update and is now failing on a second update.  I
don't know what to make of this.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by mark benedetto king <mb...@boredom.org>.

On Wed, Feb 19, 2003 at 03:54:05PM -0500, Garrett Rooney wrote:
> >>  {
> >>-    int db_err = env->txn_checkpoint (env, 0, 0, 0);
> >>+    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
> >
> 
> isn't that just masking whatever the real bug is?  i mean checkpointing 
> more often shouldn't be causing a problem, and if it is, we need to 
> figure out why, not ignore it and hope it goes away.
> 

That's true, but in the meantime, I'd like svncheck to run to completion,
which it doesn't for me, without this patch.

It's possible that all-zeroes tickles a BDB bug; that with the new
values there is no bug.

I'll investigate this a little further tonight.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by "Glenn A. Thompson" <gt...@cdr.net>.

Hey,

Forgive me if I'm out of wack here.  I'm still getting caught up.
Boy have you guys been busy.

Buuuutttt

Karl Fogel wrote:

>Branko Čibej <br...@xbc.nu> writes:
>  
>
>>I think *the* major task for 0.19 is:
>>
>>    * Create a DB monitor that can detect crashed sessions and
>>      automagically unwedge the DB.
>>
If this were to become formalized say in the fs API, I fear it presumes 
too much about the DB backend.  I hope that it can/would be hidden  down 
in  
the DB specific functions.

>>    * Stop creating transactions for read-only requests, and use
>>      ordinary locks instead.
>>
Are you talking about BDB transactions? or Subversion transactions? I'm 
assuming BDB

>>    * Reduce the number of txn_checkpoint calls in our code, or even
>>      eliminate them completely.
>>    
>>
There are only two that I recall.  One in a cleanup function and one in 
the trail commit function.
I have always believed the trail call to be excessive.  But I don't 
fully understand BDB recovery so I have never mentioned it
Like someone else said you still have logs to recover from.  Right?
I don't see any SQL impl doing such a thing.  These types of things are 
handled via DB settings on all the SQL DBs I've worked with.

>Could you expand a little on point number 2?  
>
Yes please.

Thanks,
gat

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by Branko Čibej <br...@xbc.nu>.

Karl Fogel wrote:

>Branko Čibej <br...@xbc.nu> writes:
>  
>
>>I think *the* major task for 0.19 is:
>>
>>    * Create a DB monitor that can detect crashed sessions and
>>      automagically unwedge the DB.
>>    * Stop creating transactions for read-only requests, and use
>>      ordinary locks instead.
>>    * Reduce the number of txn_checkpoint calls in our code, or even
>>      eliminate them completely.
>>    
>>
>
>All of these sound like good ideas (though I have some questions about
>the second one), but aren't they independent?
>
Oh, of course they're independent.

>  We can reduce the
>frequency of txn_checkpoint calls without reducing the frequency with
>which we create transactions in the first place, and vice versa.
>
>Oh, I think I see: We can't switch to a locking system without a DB
>monitor to detect a deadlocked database and break the cycles?  (Or am
>I just missing the point?)
>
No, we don't need a monitor for that. Failing to unlock an object is no
worse (or better) than crashing or ^C-ing while the client holds an
uncommitted DB transaction.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Branko Čibej <br...@xbc.nu> writes:
> I think *the* major task for 0.19 is:
> 
>     * Create a DB monitor that can detect crashed sessions and
>       automagically unwedge the DB.
>     * Stop creating transactions for read-only requests, and use
>       ordinary locks instead.
>     * Reduce the number of txn_checkpoint calls in our code, or even
>       eliminate them completely.

All of these sound like good ideas (though I have some questions about
the second one), but aren't they independent?  We can reduce the
frequency of txn_checkpoint calls without reducing the frequency with
which we create transactions in the first place, and vice versa.

Oh, I think I see: We can't switch to a locking system without a DB
monitor to detect a deadlocked database and break the cycles?  (Or am
I just missing the point?)

In any case, the only 0.19 issue affected by these proposals would be
#995, "Large imports and checkouts over DAV can timeout".  In any
case, 0.19 will not be the last milestone concentrating on scalability
issues, you can be sure :-).

> Before amyone starts wondering if I'm off my rocker, 

I'm on the same rocker you are.  However, a few questions:

Could you expand a little on point number 2?  I'm not sure exactly how
you're proposing to use locks, and how they're supposed to replace
some of the functionality we get from transactions.  For example, in
Subversion, read-only requests are usually reading from committed
revisions.  So let's say we don't create a BDB transaction.  How would
locking work?

   'revisions':
      Well, only the revprops might be changing.  I guess one wants a
      consistent picture of those.  So we'd lock just the revision
      record we're reading from, for the duration of the read.  During
      that time, someone changing a revprop on that revision would be
      blocked, but that wouldn't be very long, so it's okay.

   'nodes', 'representations', 'strings':
      What do we lock here?  Would the locking interfere with
      deltification?

   'changes':
      No need to lock this for read-only operations, right?

I'm sort of thinking out loud here, but I get the feeling you have a
much more specific plan in mind...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

fcntl locks (was: Checkpoint less frequently)

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Feb 21, 2003 at 01:41:36AM -0500, Greg Hudson wrote:
> On Fri, 2003-02-21 at 00:18, Branko Cibej wrote:
> > Justin, we'll need a watcher anyway -- it's the only means we have to
> > automatically unwedge a repository if a client crashes. D'you really
> > thing we can release 1.0 without fixing this totally unacceptable bug?
> 
> ("If a client crashes?"  If we're using ra_svn or ra_dav, the server
> should have a chance to clean up.  As I understand it, the issue arises
> when a server process terminates uncleanly--such as when you interrupt
> an svn command using ra_local, since in that case the "client" and
> "server" are in the same process.)

Yah, ra_local or the server process. "Client of the FS" maybe :-)

> On Unix, anyway, it seems like a fcntl-locked guard around the database
> would do the trick without a separate process.  Get a read lock for
> normal operation, or a write lock to recover.  fcntl locks are
> automatically terminated on process exit, so there is no issue of stale
> locks.

Heh. Funny that you should mention that. That is exactly what
REPOS/lock/db.lock is for. Problem is, that we don't seem to be using it
properly.

Second, an application gets a read lock, but blocks inside of BDB. Thus, the
recovery process can't get in there to do the recover. The administrator has
to go and kill that blocked client.


Really... it seems that we should solve the fcntl thing, or just rip it out
of the SVN codebase.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Michael Price <mp...@atl.lmco.com>.

Branko Čibej writes:
 > Greg Hudson wrote:
 > >On Fri, 2003-02-21 at 01:50, Branko Čibej wrote:
 > >>The BDB docs recommend having a server or monitor process that runs
 > >>recovery when necessary.
 > >
 > >I tried hunting down this reference (I've seen it before) and failed. 
 > >If you could find it, I'd appreciate it.
 > 
 > I can't seem to find it right now, either.

http://www.sleepycat.com/docs/ref/transapp/app.html

Found using 'find . -type f -print | xargs grep monitor' in my local
copy.

Michael Price               Member of the Engineering Staff
Distributed Processing Lab; Lockheed Martin Adv. Tech. Labs
A&E 3W; 1 Federal Street; Camden, NJ 08102
856-338-4021, fax 856-338-4144  email: mprice@atl.lmco.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>On Fri, 2003-02-21 at 01:50, Branko Čibej wrote:
>  
>
>>The BDB docs recommend having a server or monitor process that runs
>>recovery when necessary.
>>    
>>
>
>I tried hunting down this reference (I've seen it before) and failed. 
>If you could find it, I'd appreciate it.
>

I can't seem to find it right now, either.

>Honestly, I'm with Justin here.  If it were just me making the
>decisions, I'd say that the point at which we need a monitor process is
>the point at which we should give up on using Berkeley DB, however
>painful that might be at this stage of the game.  (Perhaps more
>realistically, we could try to produce a change to Berkeley DB which
>would make it actually work, and convince Sleepycat to adopt it.)  I'm
>tired of passing our design errors on to the user.
>  
>
I didn't mean that the user would have to start the monitor. The server
(any server) can do that itself, unles the monitor is already started.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by "Glenn A. Thompson" <gt...@cdr.net>.

Hey,

>Honestly, I'm with Justin here.  If it were just me making the
>decisions, I'd say that the point at which we need a monitor process is
>the point at which we should give up on using Berkeley DB, however
>painful that might be at this stage of the game.  (Perhaps more
>realistically, we could try to produce a change to Berkeley DB which
>would make it actually work, and convince Sleepycat to adopt it.)  I'm
>tired of passing our design errors on to the user.
>  
>
I'm for looking at BDB.  However, in their defense, it's an embedded DB. 
 They fully expect the linker to deal with these sorts of things.  The 
data layer should be in it's own process space. Gat awaits the arrows:-)

gat


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2003-02-21 at 01:50, Branko Čibej wrote:
> The BDB docs recommend having a server or monitor process that runs
> recovery when necessary.

I tried hunting down this reference (I've seen it before) and failed. 
If you could find it, I'd appreciate it.

Honestly, I'm with Justin here.  If it were just me making the
decisions, I'd say that the point at which we need a monitor process is
the point at which we should give up on using Berkeley DB, however
painful that might be at this stage of the game.  (Perhaps more
realistically, we could try to produce a change to Berkeley DB which
would make it actually work, and convince Sleepycat to adopt it.)  I'm
tired of passing our design errors on to the user.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>On Fri, 2003-02-21 at 00:18, Branko Čibej wrote:
>  
>
>>Justin, we'll need a watcher anyway -- it's the only means we have to
>>automatically unwedge a repository if a client crashes. D'you really
>>thing we can release 1.0 without fixing this totally unacceptable bug?
>>    
>>
>
>("If a client crashes?"  If we're using ra_svn or ra_dav, the server
>should have a chance to clean up.  As I understand it, the issue arises
>when a server process terminates uncleanly--such as when you interrupt
>an svn command using ra_local, since in that case the "client" and
>"server" are in the same process.)
>
Ah, right -- I meant server, of course.

>On Unix, anyway, it seems like a fcntl-locked guard around the database
>would do the trick without a separate process.  Get a read lock for
>normal operation, or a write lock to recover.  fcntl locks are
>automatically terminated on process exit, so there is no issue of stale
>locks.
>
That doesn't work, unfortunately, because you don't know that you have
to db_recover after an aborted session until you're already blocked on a
stale lock.

>(It seems like Berkeley DB should take care of this under the covers,
>really.)
>  
>
Yes, it should, but unfortunately it doesn't. The BDB docs recommend
having a server or monitor process that runs recovery when necessary.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2003-02-21 at 00:18, Branko Čibej wrote:
> Justin, we'll need a watcher anyway -- it's the only means we have to
> automatically unwedge a repository if a client crashes. D'you really
> thing we can release 1.0 without fixing this totally unacceptable bug?

("If a client crashes?"  If we're using ra_svn or ra_dav, the server
should have a chance to clean up.  As I understand it, the issue arises
when a server process terminates uncleanly--such as when you interrupt
an svn command using ra_local, since in that case the "client" and
"server" are in the same process.)

On Unix, anyway, it seems like a fcntl-locked guard around the database
would do the trick without a separate process.  Get a read lock for
normal operation, or a write lock to recover.  fcntl locks are
automatically terminated on process exit, so there is no issue of stale
locks.

(It seems like Berkeley DB should take care of this under the covers,
really.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

auto recovery (was: Checkpoint less frequently)

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Feb 21, 2003 at 01:20:55PM -0500, Greg Hudson wrote:
> On Fri, 2003-02-21 at 13:03, Branko Cibej wrote:
> [When the monitor fails to keep a process from hitting a stale lock:]
> > So we wait for a bit, then kill it.
> 
> If the monitor process is started automatically, then it may have been
> started by a different user than the one whose process hung.  So we
> can't kill it.

Not to mention all the other crap that can happen by arbitrarily whacking
processes. In the DAV case, this would be shooting down an Apache process,
and that could imply that you leave a bunch of shared memory stuffs sitting
around. Yes, Apache does try to clean up in cases like that, but let's not
plan to make it work too hard.

Just say "no" to killing processes :-)

> The following discipline would seem to work, without the need for a
> monitor process:
> 
>   * Wrap a guard file around the database, per my earlier idea.
>     (fcntl-locked, read-locked for normal access, write-locked for
>     recovery.)
> 
>   * Set the lock timeout (at db creation time).

Ah! Key item. Yes, this solves the whole ball of wax.

>   * If we time out on a lock, fail the transaction, grab a write lock on
>     the guard file, run recovery, and retry.

Well, we can change this a bit:

    * If we time out on a lock:
      - retry the transaction (maybe there are other reasons for a timeout,
        such as the database is simply *busy*)
      - if we get DB_RUNRECOVER, then:
        - fail the transaction (well, fail the *trail*, right?)
	- grab a write lock on REPOS/lock/db.lock
	- run recovery
	- unlock the guard
	- retry if we haven't exhausted the retry count

> But it may be inefficient in some cases:
> 
>   * If we erroneously time out on a lock, we will still succeed
>     eventually, but it may take much longer than it would if we had
>     waited.  But that problem should be rare.

Berkeley DB should be able to tell us that we need to run the recovery, so
we can just look for that instead of assuming the need.

>   * If multiple processes hit the stale lock, they will all run
>     recovery.  We could avoid that by putting a timestamp in the guard
>     file saying when recovery was last run, or we could hypothesize that
>     N recoveries doesn't take much longer than one recovery.

The timestamp would be nice. Each process could record when it attempts to
acquire the write lock. When it finally gets the lock, it reads the file,
sees that the recovery finished *after* its acquisition time, and just
releases the write lock and retries the operation.

> I also wonder how many of these problems go away if you instruct
> Berkeley DB to use fcntl locks.  (That's possible, right?)  And what the
> cost is in everyday performance, of course.

Hmm. Interesting, but I think the timeout is key, and should be able to get
us what we need.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Philip Martin <ph...@codematters.co.uk>.

Greg Hudson <gh...@MIT.EDU> writes:

> On Fri, 2003-02-21 at 13:03, Branko Èibej wrote:
> [When the monitor fails to keep a process from hitting a stale lock:]
> > So we wait for a bit, then kill it.
> 
> If the monitor process is started automatically, then it may have been
> started by a different user than the one whose process hung.  So we
> can't kill it.

Even if you get round that, there are other problems.  Subversion
provides libraries to encourage alternative database clients.  We
can't go blindly killing those, it may do more harm than good.  You
might kill my fancy Subversion-aware editor.  You might kill a process
that is accessing multiple repositories, in which case you may well be
the cause of other repositories hanging.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2003-02-21 at 13:03, Branko Čibej wrote:
[When the monitor fails to keep a process from hitting a stale lock:]
> So we wait for a bit, then kill it.

If the monitor process is started automatically, then it may have been
started by a different user than the one whose process hung.  So we
can't kill it.

The following discipline would seem to work, without the need for a
monitor process:

  * Wrap a guard file around the database, per my earlier idea.
    (fcntl-locked, read-locked for normal access, write-locked for
    recovery.)

  * Set the lock timeout (at db creation time).

  * If we time out on a lock, fail the transaction, grab a write lock on
    the guard file, run recovery, and retry.

But it may be inefficient in some cases:

  * If we erroneously time out on a lock, we will still succeed
    eventually, but it may take much longer than it would if we had
    waited.  But that problem should be rare.

  * If multiple processes hit the stale lock, they will all run
    recovery.  We could avoid that by putting a timestamp in the guard
    file saying when recovery was last run, or we could hypothesize that
    N recoveries doesn't take much longer than one recovery.

I also wonder how many of these problems go away if you instruct
Berkeley DB to use fcntl locks.  (That's possible, right?)  And what the
cost is in everyday performance, of course.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Branko Čibej <br...@xbc.nu>.

Greg Hudson wrote:

>On Fri, 2003-02-21 at 12:52, Branko Čibej wrote:
>  
>
>>You don't have to stop any servers, or anythng. Each server only has to
>>ask the monitor if it may open the database, and to notify it when it
>>closes it. When the monitor detects a crashed process, it starts denying
>>access to the database until all other processes have backed out, runs
>>recovery, then allows access again.
>>    
>>
>
>  
>
>>At least, that's the general idea.
>>    
>>
>
>That doesn't seem very general.
>
>  Process A opens the database
>  Process A acquires many fine locks
>  Process B opens the database
>  Process A crashes
>
>Process B is just as likely to hit a stale lock and hang as if it had
>opened the database after the crash.
>  
>
So we wait for a bit, then kill it. We know which processes were active
(i.e., fiddling with the database) at the time of the crash.

Now it's possible that there's another way to solve this problem:
setting the locl timeout. A process will only block forever on a stale
lock _unless_ a timeout has been set (say, in the DB_CONFIG file). Some
time ago when I was testing different ways to avoid the wedged
DB/blocked process problem, I tried this method and it worked within my
limited test cases. But I don't uderstand it well enough, nor have I
stressed it enough, to be ble to say whether this is an acceptable
solution or not.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2003-02-21 at 12:52, Branko Čibej wrote:
> You don't have to stop any servers, or anythng. Each server only has to
> ask the monitor if it may open the database, and to notify it when it
> closes it. When the monitor detects a crashed process, it starts denying
> access to the database until all other processes have backed out, runs
> recovery, then allows access again.

> At least, that's the general idea.

That doesn't seem very general.

  Process A opens the database
  Process A acquires many fine locks
  Process B opens the database
  Process A crashes

Process B is just as likely to hit a stale lock and hang as if it had
opened the database after the crash.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Branko Čibej <br...@xbc.nu>.

Brandon Ehle wrote:

>>
>>
>>> No, it's not.  Requiring me to have yet another process running so
>>> that the database can be checkpointed is incredibly lame.
>>>
>>> I won't even get to the issue of what happens when the checkpoint code
>>> crashes.  We'll need a watcher.  Then, another watcher.  No.   
>>
>>
>> Justin, we'll need a watcher anyway -- it's the only means we have to
>> automatically unwedge a repository if a client crashes. D'you really
>> thing we can release 1.0 without fixing this totally unacceptable bug?
>>  
>>
> I don't even think this is possible.  When the needs recoverey while
> apache is running, I usually have to logon with root privileges and do
> a killall -KILL httpd, killall svnserve, then run ipcs and delete all
> the leftover locks, then I will be able to run db_recover or svnadmin
> recover.  Then restart httpd & svnserve -d. We'd need one hell of a
> monitor to be able to accomplish all that after you take security into
> consideration.

You don't have to stop any servers, or anythng. Each server only has to
ask the monitor if it may open the database, and to notify it when it
closes it. When the monitor detects a crashed process, it starts denying
access to the database until all other processes have backed out, runs
recovery, then allows access again.

At least, that's the general idea.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Greg Stein <gs...@lyra.org>.

On Fri, Feb 21, 2003 at 05:53:18PM +0200, Jani Monoses wrote:
> 
> > Show me a database product that doesn't need a babysitter. I'm not aware
> > of one.
> But should subversion be a database?Ok, in a way yes. 
> But CVS with all its drawbacks did not need anyone with a constant on the  for logfiles eating
> the whole disk and such.Most of the babysitting should be automated.

Euh... have you ever tried to maintain a *large* CVS repository with LOTS of
activity on it? Heh. Why do you think CollabNet has been sponsoring
Subversion development? :-)

I've seen CVS knock over a box. Took the whole damn thing down. I don't
think we had to send somebody physically to the box, but we did have to
reboot the darned thing. CVS consumed all available memory and the swap. It
came to a screaming halt.

And don't get me started on stale CVS locks...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Jani Monoses <ja...@iv.ro>.

> Show me a database product that doesn't need a babysitter. I'm not aware
> of one.
But should subversion be a database?Ok, in a way yes. 
But CVS with all its drawbacks did not need anyone with a constant on the  for logfiles eating
the whole disk and such.Most of the babysitting should be automated.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Michael <mi...@ispwest.com>.

Jani Monoses writes:
 > > I don't even think this is possible.  When the needs recoverey while 
 > > apache is running, I usually have to logon with root privileges and do a 
 > > killall -KILL httpd, killall svnserve, then run ipcs and delete all the 
 > > leftover locks, then I will be able to run db_recover or svnadmin 
 > > recover.  Then restart httpd & svnserve -d. We'd need one hell of a 
 > > monitor to be able to accomplish all that after you take security into 
 > > consideration.
 > 
 > This might be the way to do it now but IMHO a svn needing a babysitter to
 > do all that should not be called 1.0 

Show me a database product that doesn't need a babysitter. I'm not aware
of one.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Jani Monoses <ja...@iv.ro>.

> I don't even think this is possible.  When the needs recoverey while 
> apache is running, I usually have to logon with root privileges and do a 
> killall -KILL httpd, killall svnserve, then run ipcs and delete all the 
> leftover locks, then I will be able to run db_recover or svnadmin 
> recover.  Then restart httpd & svnserve -d. We'd need one hell of a 
> monitor to be able to accomplish all that after you take security into 
> consideration.

This might be the way to do it now but IMHO a svn needing a babysitter to
do all that should not be called 1.0 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Brandon Ehle <az...@yahoo.com>.

> 
>
>>No, it's not.  Requiring me to have yet another process running so
>>that the database can be checkpointed is incredibly lame.
>>
>>I won't even get to the issue of what happens when the checkpoint code
>>crashes.  We'll need a watcher.  Then, another watcher.  No. 
>>    
>>
>
>Justin, we'll need a watcher anyway -- it's the only means we have to
>automatically unwedge a repository if a client crashes. D'you really
>thing we can release 1.0 without fixing this totally unacceptable bug?
>  
>
I don't even think this is possible.  When the needs recoverey while 
apache is running, I usually have to logon with root privileges and do a 
killall -KILL httpd, killall svnserve, then run ipcs and delete all the 
leftover locks, then I will be able to run db_recover or svnadmin 
recover.  Then restart httpd & svnserve -d. We'd need one hell of a 
monitor to be able to accomplish all that after you take security into 
consideration.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Branko Čibej <br...@xbc.nu>.

Justin Erenkrantz wrote:

> --On Thursday, February 20, 2003 17:45:30 +0100 Branko Èibej
> <br...@xbc.nu> wrote:
>
>> Yup. But it would be even better to move the checkpointing into a
>> separate process so that it's asynchronous with regard to the real
>> business managing versions.
>
>
> No, it's not.  Requiring me to have yet another process running so
> that the database can be checkpointed is incredibly lame.
>
> I won't even get to the issue of what happens when the checkpoint code
> crashes.  We'll need a watcher.  Then, another watcher.  No. 

Justin, we'll need a watcher anyway -- it's the only means we have to
automatically unwedge a repository if a client crashes. D'you really
thing we can release 1.0 without fixing this totally unacceptable bug?

> Please don't go this route.  I can't express my animosity towards this
> approach loud enough.  -- justin

Yes, I expect you can't. :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Feb 20, 2003 at 06:12:18PM -0800, Justin Erenkrantz wrote:
> --On Thursday, February 20, 2003 17:45:30 +0100 Branko Èibej <br...@xbc.nu> 
> wrote:
> 
> > Yup. But it would be even better to move the checkpointing into a
> > separate process so that it's asynchronous with regard to the real
> > business managing versions.
> 
> No, it's not.  Requiring me to have yet another process running so that the 
> database can be checkpointed is incredibly lame.
> 
> I won't even get to the issue of what happens when the checkpoint code 
> crashes.  We'll need a watcher.  Then, another watcher.  No.
> 
> Please don't go this route.  I can't express my animosity towards this 
> approach loud enough.  -- justin

Yah. It sucks. Quite hard. Golf balls and hoses hard.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Justin Erenkrantz <je...@apache.org>.

--On Thursday, February 20, 2003 17:45:30 +0100 Branko Èibej <br...@xbc.nu> 
wrote:

> Yup. But it would be even better to move the checkpointing into a
> separate process so that it's asynchronous with regard to the real
> business managing versions.

No, it's not.  Requiring me to have yet another process running so that the 
database can be checkpointed is incredibly lame.

I won't even get to the issue of what happens when the checkpoint code 
crashes.  We'll need a watcher.  Then, another watcher.  No.

Please don't go this route.  I can't express my animosity towards this 
approach loud enough.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by Branko Čibej <br...@xbc.nu>.

William Uther wrote:

>
> On Thursday, February 20, 2003, at 04:37  PM, Branko Čibej wrote:
>
>> Thanks, Brandon, this is a very good analysis. And it confirms my
>> suspicions that we're using _way_ too many transactions, and issuing far
>> too many txn_checkpoint calls.
>>
>> I think *the* major task for 0.19 is:
>>
>>     * Stop creating transactions for read-only requests, and use
>>       ordinary locks instead.
>
>
> Would this stop the log files growing on read-only requests? 

That's sort of the point, yes -- but it would also reduce the number of
fsyncs on txn commits, which is one of the major slowdowns.

> % ls -l repos/db/log.*
> -rw-r--r--  1 willu  staff  81442 Feb 20 21:12 repos/db/log.0000000001
> % svn up wc
> At revision 1.
> % ls -l repos/db/log.*
> -rw-r--r--  1 willu  staff  86589 Feb 20 21:12 repos/db/log.0000000001
> % svn up wc
> At revision 1.
> % ls -l repos/db/log.*
> -rw-r--r--  1 willu  staff  89135 Feb 20 21:13 repos/db/log.0000000001
>
> Here it doesn't grow much, and so it isn't a major problem, but if it
> were to go away I wouldn't mind. :) 

Imagine serving a web site from the repository. Log files will grow on
every hit -- for no good reason at all.

>>     * Reduce the number of txn_checkpoint calls in our code, or even
>>       eliminate them completely.
>>
>> Before amyone starts wondering if I'm off my rocker, consider this: you
>> only really need a txn_checkpoint when youre doing a hot backup of the
>> database, or removing old log files. Therefore, checkpoints should be
>> issued by the backup/cleanup scripts, definitely not in the critical
>> path.
>
>
> Reading http://www.sleepycat.com/docs/ref/transapp/checkpoint.html
>
> it looks like the database is safe with less frequent checkpointing. 
> (checkpointing just syncs the database files.  The log files are
> already on disk.)  Note that sleepycat mention checkpointing every 60
> seconds, and "Because checkpoints can be quite expensive, choosing how
> often to perform a checkpoint is a common tuning parameter for
> Berkeley DB applications." 

Yup. But it would be even better to move the checkpointing into a
separate process so that it's asynchronous with regard to the real
business managing versions.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Checkpoint less frequently

Posted by Sander Striker <st...@apache.org>.

> From: Sander Striker [mailto:striker@apache.org]
> Sent: Thursday, February 20, 2003 11:26 AM

>> Reading http://www.sleepycat.com/docs/ref/transapp/checkpoint.html
> 
> Reading this I wonder why we don't checkpoint only at the time right
> before we run post-commit (before we tell the client the commit succeeded).
> And only then.  Any reason to checkpoint more often?

Hmm, maybe an exception for operations that only touch (and modify!) transactions.
I'm thinking about the lock strategy notes (the full impl.) here, where 'commits'
happen on a transaction until the lock is released.

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Checkpoint less frequently

Posted by Sander Striker <st...@apache.org>.

> From: William Uther [mailto:willu.mailingLists@cse.unsw.edu.au]
> Sent: Thursday, February 20, 2003 11:19 AM

>>     * Reduce the number of txn_checkpoint calls in our code, or even
>>       eliminate them completely.
>>
>> Before amyone starts wondering if I'm off my rocker, consider this: you
>> only really need a txn_checkpoint when youre doing a hot backup of the
>> database, or removing old log files. Therefore, checkpoints should be
>> issued by the backup/cleanup scripts, definitely not in the critical 
>> path.
> 
> Reading http://www.sleepycat.com/docs/ref/transapp/checkpoint.html

Reading this I wonder why we don't checkpoint only at the time right
before we run post-commit (before we tell the client the commit succeeded).
And only then.  Any reason to checkpoint more often?

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently

Posted by William Uther <wi...@cse.unsw.edu.au>.

On Thursday, February 20, 2003, at 04:37  PM, Branko Čibej wrote:

> Thanks, Brandon, this is a very good analysis. And it confirms my
> suspicions that we're using _way_ too many transactions, and issuing 
> far
> too many txn_checkpoint calls.
>
> I think *the* major task for 0.19 is:
>
>     * Stop creating transactions for read-only requests, and use
>       ordinary locks instead.

Would this stop the log files growing on read-only requests?

% ls -l repos/db/log.*
-rw-r--r--  1 willu  staff  81442 Feb 20 21:12 repos/db/log.0000000001
% svn up wc
At revision 1.
% ls -l repos/db/log.*
-rw-r--r--  1 willu  staff  86589 Feb 20 21:12 repos/db/log.0000000001
% svn up wc
At revision 1.
% ls -l repos/db/log.*
-rw-r--r--  1 willu  staff  89135 Feb 20 21:13 repos/db/log.0000000001

Here it doesn't grow much, and so it isn't a major problem, but if it 
were to go away I wouldn't mind. :)

>     * Reduce the number of txn_checkpoint calls in our code, or even
>       eliminate them completely.
>
> Before amyone starts wondering if I'm off my rocker, consider this: you
> only really need a txn_checkpoint when youre doing a hot backup of the
> database, or removing old log files. Therefore, checkpoints should be
> issued by the backup/cleanup scripts, definitely not in the critical 
> path.

Reading http://www.sleepycat.com/docs/ref/transapp/checkpoint.html

it looks like the database is safe with less frequent checkpointing.  
(checkpointing just syncs the database files.  The log files are 
already on disk.)  Note that sleepycat mention checkpointing every 60 
seconds, and "Because checkpoints can be quite expensive, choosing how 
often to perform a checkpoint is a common tuning parameter for Berkeley 
DB applications."

later,

Will        :-}

--
Dr William Uther                            National ICT Australia
Phone: +61 2 9385 6926             School of Computer Science and 
Engineering
Email: willu@cse.unsw.edu.au             University of New South Wales
Jabber: willu@jabber.cse.unsw.edu.au          Sydney, Australia


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by Branko Čibej <br...@xbc.nu>.

Thanks, Brandon, this is a very good analysis. And it confirms my
suspicions that we're using _way_ too many transactions, and issuing far
too many txn_checkpoint calls.

I think *the* major task for 0.19 is:

    * Create a DB monitor that can detect crashed sessions and
      automagically unwedge the DB.
    * Stop creating transactions for read-only requests, and use
      ordinary locks instead.
    * Reduce the number of txn_checkpoint calls in our code, or even
      eliminate them completely.

Before amyone starts wondering if I'm off my rocker, consider this: you
only really need a txn_checkpoint when youre doing a hot backup of the
database, or removing old log files. Therefore, checkpoints should be
issued by the backup/cleanup scripts, definitely not in the critical path.

I actually think moving the checkpointing out of the main code is the
simplest of the three.

Brandon Ehle wrote:

>>
>>
>>>>  
>>>
>>>
>>> I'm in favor of committing this change.  I even volunteer to test it.
>>>
>>> Without it, my ra_svn tests frequently hang.
>>>
>>
>> isn't that just masking whatever the real bug is?  i mean
>> checkpointing more often shouldn't be causing a problem, and if it
>> is, we need to figure out why, not ignore it and hope it goes away.
>
>
>
> I've been tracking down this issue for about 3 months now and here is
> my guess on whats happening.
>
> Pretty much every svn operation touches the database in some way or
> another.  Even an svn update when nothing has changed in either your
> working copy or the repository, so every operation will put the
> repository in a state where txn_checkpoint() has something to do. 
> Therefore, txn_checkpoint() will get run after every single operation
> (in ra_dav mode this includes every PUT).
>
> Normally this isn't too bad, but as your repository grows, the
> checkpoint times will get larger and larger and eventually you could
> get to the point where my 15GB repository is at and a txn_checkpoint()
> takes 5 minutes or more.
>
> Any operations on the database after this point will wait in
> __os_yield() for a short period of time until the checkpoint has
> released its lock on the shared memory for the last log file, which is
> needed for quite a few operations.  This is the reason why it appears
> why the subversion call stack gets stuck in __os_yield().  If it takes
> more than 90 seconds for txn_checkpoint() to release its locks, thats
> when you see the neon timeouts over ra_dav.
>
> As alot of small operations are running on the database in ra_dav
> mode, the repository can get into a state where it needs to run 2 or 3
> txn_checkpoints() in a row.   This will easily cause the 90 second
> neon timeout.  The txn limiting patch attempts to limit the number of
> checkpoints that will run in a row under these circumstances, although
> it is still very possible to get timeouts if it takes your machine
> more than 90 seconds for a single txn_checkpoint() to release its locks.
>
> For a multi-user ra_dav server, another fun part of this problem is
> that only one txn_checkpoint() can run at a time, so as each operation
> wants to run txn_checkpoint(), and if you have enough users,
> eventually every apache thread will be waiting for a turn to run
> txn_checkpoint() so apache will have to spawn some more processes (if
> it can).  If your apache server is stuck in this mode and you attempt
> to shut it down, it could take on the order of several hours until
> apache finishes shuttting down.  The txn limiting patch helps, but
> does not completely address this issue (you should be able to run
> about 4x as many users on your server with the patch applied).



-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by Brandon Ehle <az...@yahoo.com>.

> 
>
>>>  
>>
>> I'm in favor of committing this change.  I even volunteer to test it.
>>
>> Without it, my ra_svn tests frequently hang.
>>
>
> isn't that just masking whatever the real bug is?  i mean 
> checkpointing more often shouldn't be causing a problem, and if it is, 
> we need to figure out why, not ignore it and hope it goes away.


I've been tracking down this issue for about 3 months now and here is my 
guess on whats happening.

Pretty much every svn operation touches the database in some way or 
another.  Even an svn update when nothing has changed in either your 
working copy or the repository, so every operation will put the 
repository in a state where txn_checkpoint() has something to do.  
Therefore, txn_checkpoint() will get run after every single operation 
(in ra_dav mode this includes every PUT).

Normally this isn't too bad, but as your repository grows, the 
checkpoint times will get larger and larger and eventually you could get 
to the point where my 15GB repository is at and a txn_checkpoint() takes 
5 minutes or more.

Any operations on the database after this point will wait in 
__os_yield() for a short period of time until the checkpoint has 
released its lock on the shared memory for the last log file, which is 
needed for quite a few operations.  This is the reason why it appears 
why the subversion call stack gets stuck in __os_yield().  If it takes 
more than 90 seconds for txn_checkpoint() to release its locks, thats 
when you see the neon timeouts over ra_dav.

As alot of small operations are running on the database in ra_dav mode, 
the repository can get into a state where it needs to run 2 or 3 
txn_checkpoints() in a row.   This will easily cause the 90 second neon 
timeout.  The txn limiting patch attempts to limit the number of 
checkpoints that will run in a row under these circumstances, although 
it is still very possible to get timeouts if it takes your machine more 
than 90 seconds for a single txn_checkpoint() to release its locks.

For a multi-user ra_dav server, another fun part of this problem is that 
only one txn_checkpoint() can run at a time, so as each operation wants 
to run txn_checkpoint(), and if you have enough users, eventually every 
apache thread will be waiting for a turn to run txn_checkpoint() so 
apache will have to spawn some more processes (if it can).  If your 
apache server is stuck in this mode and you attempt to shut it down, it 
could take on the order of several hours until apache finishes shuttting 
down.  The txn limiting patch helps, but does not completely address 
this issue (you should be able to run about 4x as many users on your 
server with the patch applied).



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

mark benedetto king wrote:

>On Wed, Feb 19, 2003 at 02:36:07PM -0500, Brandon Ehle wrote:
>  
>
>>Index: subversion/libsvn_fs/fs.c
>>===================================================================
>>--- subversion/libsvn_fs/fs.c   (revision 4721)
>>+++ subversion/libsvn_fs/fs.c   (working copy)
>>@@ -163,7 +163,7 @@
>>
>>  /* Checkpoint any changes.  */
>>  {
>>-    int db_err = env->txn_checkpoint (env, 0, 0, 0);
>>+    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
>>
>>#if SVN_BDB_HAS_DB_INCOMPLETE
>>    while (db_err == DB_INCOMPLETE)
>>
>>
>>    
>>
>
>I'm in favor of committing this change.  I even volunteer to test it.
>
>Without it, my ra_svn tests frequently hang.
>

isn't that just masking whatever the real bug is?  i mean checkpointing 
more often shouldn't be causing a problem, and if it is, we need to 
figure out why, not ignore it and hope it goes away.

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Checkpoint less frequently (was Re: Still hang on svn 4951 RedHat 7.3 SMP)

Posted by mark benedetto king <mb...@boredom.org>.

On Wed, Feb 19, 2003 at 02:36:07PM -0500, Brandon Ehle wrote:
> 
> Index: subversion/libsvn_fs/fs.c
> ===================================================================
> --- subversion/libsvn_fs/fs.c   (revision 4721)
> +++ subversion/libsvn_fs/fs.c   (working copy)
> @@ -163,7 +163,7 @@
> 
>   /* Checkpoint any changes.  */
>   {
> -    int db_err = env->txn_checkpoint (env, 0, 0, 0);
> +    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
> 
> #if SVN_BDB_HAS_DB_INCOMPLETE
>     while (db_err == DB_INCOMPLETE)
> 
> 

I'm in favor of committing this change.  I even volunteer to test it.

Without it, my ra_svn tests frequently hang.

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by David Summers <da...@summersoft.fay.ar.us>.

I'm runnining 30 iterations of this and will get back to you as soon
as it is finished.

   Thanks!


On Wed, 19 Feb 2003, Brandon Ehle wrote:

> David Summers wrote:
> 
> >On 19 Feb 2003, Philip Martin wrote:
> >  
> >
> >>You need to run the test manually to avoid truncating the buffered output.
> >>
> >>$ cd subversion/tests/clients/cmdline
> >>$ ./externals_tests.py 6 BASE_URL=svn://localhost
> >>    
> >>
> >
> >OK, Finally figured out how to run it manually.  I've run it 20-30 times 
> >manually and frequently (but not always) it hangs.  Here is the last part 
> >of the manual run for this one.  In this particular run, I'm hanging with 
> >8 lt-svnserve processes running (Ack!).  Here are the traces of the lt-svn and
> >the 8 lt-svnserve processes and the results of manually running the 
> >externals_tests.py:
> >
> >RedHat 7.3 non-SMP
> >
> >
> >Whew!  Hope this helps!
> >
> >  
> >
> Could you try this patch and lemme know if you see any changes?  I've 
> sent it to a couple of other people to try, but I haven't heard anything 
> back yet.
> 
> Index: subversion/libsvn_fs/fs.c
> ===================================================================
> --- subversion/libsvn_fs/fs.c   (revision 4721)
> +++ subversion/libsvn_fs/fs.c   (working copy)
> @@ -163,7 +163,7 @@
> 
>   /* Checkpoint any changes.  */
>   {
> -    int db_err = env->txn_checkpoint (env, 0, 0, 0);
> +    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
> 
> #if SVN_BDB_HAS_DB_INCOMPLETE
>     while (db_err == DB_INCOMPLETE)
> 
> 
> 

-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by David Summers <da...@summersoft.fay.ar.us>.

I've currently run through 10 of the 30 iterations of the RA_SVN tests 
using this patch and so far it hasn't failed once!

  - David

On Wed, 19 Feb 2003, Brandon Ehle wrote:

> Date: Wed, 19 Feb 2003 14:36:07 -0500
> From: Brandon Ehle <az...@yahoo.com>
> To: David Summers <da...@summersoft.fay.ar.us>
> Cc: dev@subversion.tigris.org
> Subject: Re: Still hang on svn 4951 RedHat 7.3 SMP
> 
> David Summers wrote:
> 
> >On 19 Feb 2003, Philip Martin wrote:
> >  
> >
> >>You need to run the test manually to avoid truncating the buffered output.
> >>
> >>$ cd subversion/tests/clients/cmdline
> >>$ ./externals_tests.py 6 BASE_URL=svn://localhost
> >>    
> >>
> >
> >OK, Finally figured out how to run it manually.  I've run it 20-30 times 
> >manually and frequently (but not always) it hangs.  Here is the last part 
> >of the manual run for this one.  In this particular run, I'm hanging with 
> >8 lt-svnserve processes running (Ack!).  Here are the traces of the lt-svn and
> >the 8 lt-svnserve processes and the results of manually running the 
> >externals_tests.py:
> >
> >RedHat 7.3 non-SMP
> >
> >
> >Whew!  Hope this helps!
> >
> >  
> >
> Could you try this patch and lemme know if you see any changes?  I've 
> sent it to a couple of other people to try, but I haven't heard anything 
> back yet.
> 
> Index: subversion/libsvn_fs/fs.c
> ===================================================================
> --- subversion/libsvn_fs/fs.c   (revision 4721)
> +++ subversion/libsvn_fs/fs.c   (working copy)
> @@ -163,7 +163,7 @@
> 
>   /* Checkpoint any changes.  */
>   {
> -    int db_err = env->txn_checkpoint (env, 0, 0, 0);
> +    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
> 
> #if SVN_BDB_HAS_DB_INCOMPLETE
>     while (db_err == DB_INCOMPLETE)
> 
> 
> 

-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by Philip Martin <ph...@codematters.co.uk>.

David Summers <da...@summersoft.fay.ar.us> writes:

> > Index: subversion/libsvn_fs/fs.c
> > ===================================================================
> > --- subversion/libsvn_fs/fs.c   (revision 4721)
> > +++ subversion/libsvn_fs/fs.c   (working copy)
> > @@ -163,7 +163,7 @@
> > 
> >   /* Checkpoint any changes.  */
> >   {
> > -    int db_err = env->txn_checkpoint (env, 0, 0, 0);
> > +    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
> > 
> > #if SVN_BDB_HAS_DB_INCOMPLETE
> >     while (db_err == DB_INCOMPLETE)
> > 
> All 30 iterations completed without a single hang!  Since it was mostly 
> hanging before and only occasionly not hanging then I would say that's 
> definitely either the problem or the symptom of the problem (sounds like 
> symptom according to other related email).

Each regression test is fairly small, 8MB and 60 seconds is probably
enough to complete each test without a single checkpoint.  It's not
clear why checkpointing should fail on some systems and not others.

Have you ever tried running the stress.pl script on this system?  You
will find at tools/dev/stress.pl, there are instructions for running
it at the top if the file.  I would be interested to know if you can
provoke a hang by running it.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by David Summers <da...@summersoft.fay.ar.us>.

All 30 iterations completed without a single hang!  Since it was mostly 
hanging before and only occasionly not hanging then I would say that's 
definitely either the problem or the symptom of the problem (sounds like 
symptom according to other related email).

Thanks!
    - David Summers

On Wed, 19 Feb 2003, Brandon Ehle wrote:

> David Summers wrote:
> 
> >On 19 Feb 2003, Philip Martin wrote:
> >  
> >
> >>You need to run the test manually to avoid truncating the buffered output.
> >>
> >>$ cd subversion/tests/clients/cmdline
> >>$ ./externals_tests.py 6 BASE_URL=svn://localhost
> >>    
> >>
> >
> >OK, Finally figured out how to run it manually.  I've run it 20-30 times 
> >manually and frequently (but not always) it hangs.  Here is the last part 
> >of the manual run for this one.  In this particular run, I'm hanging with 
> >8 lt-svnserve processes running (Ack!).  Here are the traces of the lt-svn and
> >the 8 lt-svnserve processes and the results of manually running the 
> >externals_tests.py:
> >
> >RedHat 7.3 non-SMP
> >
> >
> >Whew!  Hope this helps!
> >
> >  
> >
> Could you try this patch and lemme know if you see any changes?  I've 
> sent it to a couple of other people to try, but I haven't heard anything 
> back yet.
> 
> Index: subversion/libsvn_fs/fs.c
> ===================================================================
> --- subversion/libsvn_fs/fs.c   (revision 4721)
> +++ subversion/libsvn_fs/fs.c   (working copy)
> @@ -163,7 +163,7 @@
> 
>   /* Checkpoint any changes.  */
>   {
> -    int db_err = env->txn_checkpoint (env, 0, 0, 0);
> +    int db_err = env->txn_checkpoint (env, 8000, 60, 0);
> 
> #if SVN_BDB_HAS_DB_INCOMPLETE
>     while (db_err == DB_INCOMPLETE)
> 
> 
> 

-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by Brandon Ehle <az...@yahoo.com>.

David Summers wrote:

>On 19 Feb 2003, Philip Martin wrote:
>  
>
>>You need to run the test manually to avoid truncating the buffered output.
>>
>>$ cd subversion/tests/clients/cmdline
>>$ ./externals_tests.py 6 BASE_URL=svn://localhost
>>    
>>
>
>OK, Finally figured out how to run it manually.  I've run it 20-30 times 
>manually and frequently (but not always) it hangs.  Here is the last part 
>of the manual run for this one.  In this particular run, I'm hanging with 
>8 lt-svnserve processes running (Ack!).  Here are the traces of the lt-svn and
>the 8 lt-svnserve processes and the results of manually running the 
>externals_tests.py:
>
>RedHat 7.3 non-SMP
>
>
>Whew!  Hope this helps!
>
>  
>
Could you try this patch and lemme know if you see any changes?  I've 
sent it to a couple of other people to try, but I haven't heard anything 
back yet.

Index: subversion/libsvn_fs/fs.c
===================================================================
--- subversion/libsvn_fs/fs.c   (revision 4721)
+++ subversion/libsvn_fs/fs.c   (working copy)
@@ -163,7 +163,7 @@

  /* Checkpoint any changes.  */
  {
-    int db_err = env->txn_checkpoint (env, 0, 0, 0);
+    int db_err = env->txn_checkpoint (env, 8000, 60, 0);

#if SVN_BDB_HAS_DB_INCOMPLETE
    while (db_err == DB_INCOMPLETE)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by David Summers <da...@summersoft.fay.ar.us>.

On 19 Feb 2003, Philip Martin wrote:
> You need to run the test manually to avoid truncating the buffered output.
> 
> $ cd subversion/tests/clients/cmdline
> $ ./externals_tests.py 6 BASE_URL=svn://localhost

OK, Finally figured out how to run it manually.  I've run it 20-30 times 
manually and frequently (but not always) it hangs.  Here is the last part 
of the manual run for this one.  In this particular run, I'm hanging with 
8 lt-svnserve processes running (Ack!).  Here are the traces of the lt-svn and
the 8 lt-svnserve processes and the results of manually running the 
externals_tests.py:

RedHat 7.3 non-SMP

externals_tests.py output (run manually):
==========================================
CMD: svn "up" "working_copies/externals_tests-2.init" <TIME = 0.002711>
CMD: svnadmin "create" "repositories/externals_tests-2.other" <TIME = 
0.002638>
CMD: svnadmin dump "repositories/externals_tests-2" | svnadmin load 
"repositories/externals_tests-2.other" <TIME = 0.009731>
CMD: svn "pset" "-F" "working_copies/externals_tests-2.init/tmp0mU5ZY" 
"svn:externals" "working_copies/externals_tests-2.init/A/B" <TIME = 
0.002714>
CMD: svn "pset" "-F" "working_copies/externals_tests-2.init/tmp0mU5ZY" 
"svn:externals" "working_copies/externals_tests-2.init/A/D" <TIME = 
0.002633>
CMD: svn "ci" "-m" "log msg" "working_copies/externals_tests-2.init" <TIME 
= 0.002701>
CMD: svn "status" "-v" "-u" "-q" "working_copies/externals_tests-2.init" 
<TIME = 0.002605>
CMD: svn "checkout" "--username" "jrandom" "--password" "rayjandom" 
"svn://localhost/repositories/externals_tests-2" 
"working_copies/externals_tests-2" <TIME = 0.002673>
CMD: svn "checkout" "--username" "jrandom" "--password" "rayjandom" 
"svn://localhost/repositories/externals_tests-2" 
"working_copies/externals_tests-2.other" <TIME = 0.006195>
CMD: svn "pset" "-F" "local_tmp/tmpXoZLqH" "svn:externals" 
"working_copies/externals_tests-2/A/D" <TIME = 0.002666>
CMD: svn "ci" "-m" "log msg" "--quiet" 
"working_copies/externals_tests-2/A/D" <TIME = 0.002444>
CMD: svn "up" "working_copies/externals_tests-2.other" <TIME = 0.002650>
 
lt-svn
======
#0  0x420e0187 in poll () from /lib/i686/libc.so.6
#1  0x4017ba6c in apr_poll (aprset=0xbfffe5c0, num=1, nsds=0xbfffe5bc, 
    timeout=-1) at poll.c:168
#2  0x4017c0c3 in apr_wait_for_io_or_timeout (f=0x0, s=0x80f26a0, 
for_read=1)
    at waitio.c:92
#3  0x40172f39 in apr_socket_recv (sock=0x80f26a0, 
    buf=0x80db540 "( success ( ) ) ) ) ( ) ) ", len=0xbfffe698)
    at sendrecv.c:125
#4  0x4017362c in apr_recv (sock=0x80f26a0, 
    buf=0x80db540 "( success ( ) ) ) ) ( ) ) ", len=0xbfffe698)
    at sendrecv.c:1058
#5  0x4012b132 in readbuf_input (conn=0x80db530, 
    data=0x80db540 "( success ( ) ) ) ) ( ) ) ", len=0xbfffe698)
    at subversion/libsvn_ra_svn/marshal.c:161
#6  0x4012b221 in readbuf_fill (conn=0x80db530)
    at subversion/libsvn_ra_svn/marshal.c:180
#7  0x4012b267 in readbuf_getchar (conn=0x80db530, 
    result=0xbfffe70b 
"¿\210Ò\022@°Þ\005\bhÙ\r\bHçÿ¿\211¾\022@0µ\r\b°Þ\005\b<çÿ¿DÅ\022@TË\022@VË\022@Xçÿ¿0Ú\r\b\210Ò\022@°Þ\005\b¸çÿ¿\026¿\022@0µ\r\b°Þ\005\bDÅ\022@ çÿ¿¤çÿ¿TË\022@\002") 
at subversion/libsvn_ra_svn/marshal.c:189
#8  0x4012b29e in readbuf_getchar_skip_whitespace (conn=0x80db530, 
    result=0xbfffe70b 
"¿\210Ò\022@°Þ\005\bhÙ\r\bHçÿ¿\211¾\022@0µ\r\b°Þ\005\b<çÿ¿DÅ\022@TË\022@VË\022@Xçÿ¿0Ú\r\b\210Ò\022@°Þ\005\b¸çÿ¿\026¿\022@0µ\r\b°Þ\005\bDÅ\
022@ çÿ¿¤çÿ¿TË\022@\002") at subversion/libsvn_ra_svn/marshal.c:199
#9  0x4012b9ee in svn_ra_svn_read_item (conn=0x80db530, pool=0x805deb0, 
    item=0xbfffe73c) at subversion/libsvn_ra_svn/marshal.c:446
#10 0x4012be89 in svn_ra_svn_read_tuple (conn=0x80db530, pool=0x805deb0, 
    fmt=0x4012c544 "wl") at subversion/libsvn_ra_svn/marshal.c:557
#11 0x4012bf16 in svn_ra_svn_read_cmd_response (conn=0x80db530, 
    pool=0x805deb0, fmt=0x4012c4c2 "")
    at subversion/libsvn_ra_svn/marshal.c:581
#12 0x40128271 in ra_svn_set_path (baton=0x80dd968, path=0x4003fcfb "", 
rev=5, 
    pool=0x805deb0) at subversion/libsvn_ra_svn/client.c:169
#13 0x4002ce66 in svn_wc_crawl_revisions (
    path=0x8095900 "working_copies/externals_tests-2.other/A/B/exdir_G", 
    adm_access=0x80f1638, reporter=0x4012d020, report_baton=0x80dd968, 
    restore_files=1, recurse=1, notify_func=0x804bc28 <notify>, 
    notify_baton=0x805ecf0, traversal_info=0x8095938, pool=0x805deb0)
    at subversion/libsvn_wc/adm_crawler.c:383
#14 0x40024d15 in svn_client__update_internal (
    path=0x8095900 "working_copies/externals_tests-2.other/A/B/exdir_G", 
    revision=0x8095818, recurse=1, timestamp_sleep=0xbfffeb48, 
ctx=0xbfffed70, 
    pool=0x805deb0) at subversion/libsvn_client/update.c:138
#15 0x40020c91 in handle_external_item_change (key=0x8095538, klen=7, 
    status=svn_hash_diff_key_both, baton=0xbfffe9f0)
    at subversion/libsvn_client/externals.c:456
#16 0x401375cf in svn_hash_diff (hash_a=0x80953e0, hash_b=0x8095668, 
    diff_func=0x40020a5c <handle_external_item_change>, 
    diff_func_baton=0xbfffe9f0, pool=0x805deb0)
    at subversion/libsvn_subr/hash.c:300
#17 0x40020dc8 in handle_externals_desc_change (key=0x80980c8, klen=42, 
    status=svn_hash_diff_key_both, baton=0xbfffeaa0)
    at subversion/libsvn_client/externals.c:545
#18 0x401375cf in svn_hash_diff (hash_a=0x805ed18, hash_b=0x805ed78, 
    diff_func=0x40020d04 <handle_externals_desc_change>, 
    diff_func_baton=0xbfffeaa0, pool=0x805deb0)
    at subversion/libsvn_subr/hash.c:300
#19 0x40020e3f in svn_client__handle_externals (traversal_info=0x805ed08, 
    update_unchanged=1, timestamp_sleep=0xbfffeb48, ctx=0xbfffed70, 
    pool=0x805deb0) at subversion/libsvn_client/externals.c:571
#20 0x40024d5f in svn_client__update_internal (
    path=0x805ec38 "working_copies/externals_tests-2.other", 
    revision=0xbfffed90, recurse=1, timestamp_sleep=0x0, ctx=0xbfffed70, 
    pool=0x805deb0) at subversion/libsvn_client/update.c:160
#21 0x40024dbe in svn_client_update (
    path=0x805ec38 "working_copies/externals_tests-2.other", 
    revision=0xbfffed90, recurse=1, ctx=0xbfffed70, pool=0x805deb0)
    at subversion/libsvn_client/update.c:181
#22 0x0804fc82 in svn_cl__update (os=0x805dfd0, baton=0xbfffec10, 
    pool=0x805deb0) at subversion/clients/cmdline/update-cmd.c:70
#23 0x0804dc3e in main (argc=3, argv=0xbfffeea4)
    at subversion/clients/cmdline/main.c:994
#24 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #1
==============
#0  0x420e8132 in accept () from /lib/i686/libc.so.6
#1  0x401db593 in accept () from /lib/i686/libpthread.so.0
#2  0x4014c830 in apr_socket_accept (new=0xbfffe1bc, sock=0x8052930, 
    connection_context=0x80646b0) at sockets.c:201
#3  0x4014cd48 in apr_accept (new=0xbfffe1bc, sock=0x8052930, 
    connection_context=0x80646b0) at sockets.c:420
#4  0x0804a6a6 in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:161
#5  0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #2
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400ccc69 in __txn_ckp_log () from /usr/lib/libdb-4.0.so
#7  0x400cc661 in __txn_checkpoint () from /usr/lib/libdb-4.0.so
#8  0x4002ff95 in cleanup_fs (fs=0x807f7c0) at 
subversion/libsvn_fs/fs.c:168
#9  0x40030051 in cleanup_fs_apr (data=0x807f7c0)
    at subversion/libsvn_fs/fs.c:294
#10 0x40153275 in run_cleanups (cref=0x807f798) at apr_pools.c:1976
#11 0x40152642 in apr_pool_destroy (pool=0x807f788) at apr_pools.c:755
#12 0x40152626 in apr_pool_destroy (pool=0x80646b0) at apr_pools.c:752
#13 0x40152626 in apr_pool_destroy (pool=0x80522d0) at apr_pools.c:752
#14 0x40152626 in apr_pool_destroy (pool=0x804e250) at apr_pools.c:752
#15 0x4015215c in apr_pool_terminate () at apr_pools.c:585
#16 0x4014f523 in apr_terminate () at start.c:117
#17 0x4202bc5b in exit () from /lib/i686/libc.so.6
#18 0x0804a7dd in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:204
#19 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #3
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400ccc69 in __txn_ckp_log () from /usr/lib/libdb-4.0.so
#7  0x400cc661 in __txn_checkpoint () from /usr/lib/libdb-4.0.so
#8  0x4002ff95 in cleanup_fs (fs=0x807f7c0) at 
subversion/libsvn_fs/fs.c:168
#9  0x40030051 in cleanup_fs_apr (data=0x807f7c0)
    at subversion/libsvn_fs/fs.c:294
#10 0x40153275 in run_cleanups (cref=0x807f798) at apr_pools.c:1976
#11 0x40152642 in apr_pool_destroy (pool=0x807f788) at apr_pools.c:755
#12 0x40152626 in apr_pool_destroy (pool=0x80646b0) at apr_pools.c:752
#13 0x40152626 in apr_pool_destroy (pool=0x80522d0) at apr_pools.c:752
#14 0x40152626 in apr_pool_destroy (pool=0x804e250) at apr_pools.c:752
#15 0x4015215c in apr_pool_terminate () at apr_pools.c:585
#16 0x4014f523 in apr_terminate () at start.c:117
#17 0x4202bc5b in exit () from /lib/i686/libc.so.6
#18 0x0804a7dd in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:204
#19 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #4
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400ccc69 in __txn_ckp_log () from /usr/lib/libdb-4.0.so
#7  0x400cc661 in __txn_checkpoint () from /usr/lib/libdb-4.0.so
#8  0x4002ff95 in cleanup_fs (fs=0x807f7c0) at 
subversion/libsvn_fs/fs.c:168
#9  0x40030051 in cleanup_fs_apr (data=0x807f7c0)
    at subversion/libsvn_fs/fs.c:294
#10 0x40153275 in run_cleanups (cref=0x807f798) at apr_pools.c:1976
#11 0x40152642 in apr_pool_destroy (pool=0x807f788) at apr_pools.c:755
#12 0x40152626 in apr_pool_destroy (pool=0x80646b0) at apr_pools.c:752
#13 0x40152626 in apr_pool_destroy (pool=0x80522d0) at apr_pools.c:752
#14 0x40152626 in apr_pool_destroy (pool=0x804e250) at apr_pools.c:752
#15 0x4015215c in apr_pool_terminate () at apr_pools.c:585
#16 0x4014f523 in apr_terminate () at start.c:117
#17 0x4202bc5b in exit () from /lib/i686/libc.so.6
#18 0x0804a7dd in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:204
#19 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #5
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400ccc69 in __txn_ckp_log () from /usr/lib/libdb-4.0.so
#7  0x400cc661 in __txn_checkpoint () from /usr/lib/libdb-4.0.so
#8  0x4002ff95 in cleanup_fs (fs=0x807f7c0) at 
subversion/libsvn_fs/fs.c:168
#9  0x40030051 in cleanup_fs_apr (data=0x807f7c0)
    at subversion/libsvn_fs/fs.c:294
#10 0x40153275 in run_cleanups (cref=0x807f798) at apr_pools.c:1976
#11 0x40152642 in apr_pool_destroy (pool=0x807f788) at apr_pools.c:755
#12 0x40152626 in apr_pool_destroy (pool=0x80646b0) at apr_pools.c:752
#13 0x40152626 in apr_pool_destroy (pool=0x80522d0) at apr_pools.c:752
#14 0x40152626 in apr_pool_destroy (pool=0x804e250) at apr_pools.c:752
#15 0x4015215c in apr_pool_terminate () at apr_pools.c:585
#16 0x4014f523 in apr_terminate () at start.c:117
#17 0x4202bc5b in exit () from /lib/i686/libc.so.6
#18 0x0804a7dd in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:204
#19 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6


lt-svnserve #6
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400ccc69 in __txn_ckp_log () from /usr/lib/libdb-4.0.so
#7  0x400cc661 in __txn_checkpoint () from /usr/lib/libdb-4.0.so
#8  0x4002ff95 in cleanup_fs (fs=0x807b7b0) at 
subversion/libsvn_fs/fs.c:168
#9  0x40030051 in cleanup_fs_apr (data=0x807b7b0)
    at subversion/libsvn_fs/fs.c:294
#10 0x40153275 in run_cleanups (cref=0x807b788) at apr_pools.c:1976
#11 0x40152642 in apr_pool_destroy (pool=0x807b778) at apr_pools.c:755
#12 0x40152626 in apr_pool_destroy (pool=0x80646b0) at apr_pools.c:752
#13 0x40152626 in apr_pool_destroy (pool=0x80522d0) at apr_pools.c:752
#14 0x40152626 in apr_pool_destroy (pool=0x804e250) at apr_pools.c:752
#15 0x4015215c in apr_pool_terminate () at apr_pools.c:585
#16 0x4014f523 in apr_terminate () at start.c:117
#17 0x4202bc5b in exit () from /lib/i686/libc.so.6
#18 0x0804a7dd in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:204
#19 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #7
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400ccc69 in __txn_ckp_log () from /usr/lib/libdb-4.0.so
#7  0x400cc661 in __txn_checkpoint () from /usr/lib/libdb-4.0.so
#8  0x4002ff95 in cleanup_fs (fs=0x807f7c0) at 
subversion/libsvn_fs/fs.c:168
#9  0x40030051 in cleanup_fs_apr (data=0x807f7c0)
    at subversion/libsvn_fs/fs.c:294
#10 0x40153275 in run_cleanups (cref=0x807f798) at apr_pools.c:1976
#11 0x40152642 in apr_pool_destroy (pool=0x807f788) at apr_pools.c:755
#12 0x40152626 in apr_pool_destroy (pool=0x80646b0) at apr_pools.c:752
#13 0x40152626 in apr_pool_destroy (pool=0x80522d0) at apr_pools.c:752
#14 0x40152626 in apr_pool_destroy (pool=0x804e250) at apr_pools.c:752
#15 0x4015215c in apr_pool_terminate () at apr_pools.c:585
#16 0x4014f523 in apr_terminate () at start.c:117
#17 0x4202bc5b in exit () from /lib/i686/libc.so.6
#18 0x0804a7dd in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:204
#19 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6

lt-svnserve #8
==============
#0  0x420e19ee in select () from /lib/i686/libc.so.6
#1  0x400dc77c in __DTOR_END__ () from /usr/lib/libdb-4.0.so
#2  0x400be5e5 in __os_yield () from /usr/lib/libdb-4.0.so
#3  0x4005987d in __db_tas_mutex_lock () from /usr/lib/libdb-4.0.so
#4  0x400b4839 in __log_put_int () from /usr/lib/libdb-4.0.so
#5  0x400b43b1 in __log_put () from /usr/lib/libdb-4.0.so
#6  0x400cc938 in __txn_regop_log () from /usr/lib/libdb-4.0.so
#7  0x400cb742 in __txn_commit () from /usr/lib/libdb-4.0.so
#8  0x400339f7 in commit_trail (trail=0x808d108, fs=0x807f7c0)
    at subversion/libsvn_fs/trail.c:100
#9  0x40033aad in svn_fs__retry_txn (fs=0x807f7c0, 
    txn_body=0x400379c4 <txn_body_begin_txn>, baton=0xbfffdf10, 
pool=0x808ced8)
    at subversion/libsvn_fs/trail.c:136
#10 0x40037a78 in svn_fs_begin_txn (txn_p=0x808d0a4, fs=0x807f7c0, rev=5, 
    pool=0x808ced8) at subversion/libsvn_fs/txn.c:134
#11 0x4001b7e8 in svn_repos_fs_begin_txn_for_update (txn_p=0x808d0a4, 
    repos=0x8069650, rev=5, author=0x808d0e0 "anonymous", pool=0x808ced8)
    at subversion/libsvn_repos/fs-wrap.c:127
#12 0x4001dd73 in svn_repos_set_path (report_baton=0x808d0a0, 
    path=0x808efc8 "", revision=5, pool=0x808eee0)
    at subversion/libsvn_repos/reporter.c:173
#13 0x0804a8aa in set_path (conn=0x80666b8, pool=0x808eee0, 
params=0x808efa0, 
    baton=0xbfffe048) at subversion/svnserve/serve.c:96
#14 0x4010015e in svn_ra_svn_handle_commands (conn=0x80666b8, 
pool=0x808ced8, 
    commands=0x804cdc4, baton=0xbfffe048, pass_through_errors=0)
    at subversion/libsvn_ra_svn/marshal.c:637
#15 0x0804ab23 in handle_report (conn=0x80666b8, pool=0x808ced8, 
    repos_url=0x8064ca0 
"svn://localhost/repositories/externals_tests-2.other", baton=0x808d0a0) 
at subversion/svnserve/serve.c:169
#16 0x0804bbb9 in update (conn=0x80666b8, pool=0x808ced8, 
params=0x808cf88, 
    baton=0xbfffe140) at subversion/svnserve/serve.c:592
#17 0x4010015e in svn_ra_svn_handle_commands (conn=0x80666b8, 
pool=0x80646b0, 
    commands=0x804ce0c, baton=0xbfffe140, pass_through_errors=0)
    at subversion/libsvn_ra_svn/marshal.c:637
#18 0x0804c89f in serve (conn=0x80666b8, 
    root=0x8052830 
"/home/dsummers/rpms/build/subversion-0.17.1/subversion/tests/clients/cmdline", 
tunnel=0, read_only=0, pool=0x80646b0)
    at subversion/svnserve/serve.c:986
#19 0x0804a7bc in main (argc=4, argv=0xbfffe374)
    at subversion/svnserve/main.c:201
#20 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6



Whew!  Hope this helps!

-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Still hang on svn 4951 RedHat 7.3 SMP

Posted by Philip Martin <ph...@codematters.co.uk>.

David Summers <da...@summersoft.fay.ar.us> writes:

> Here is what happens when I look in the log file:
> CMD: svnadmin "create" "repositories/externals_tests-6" <TIME = 0.001808>
> CMD: svnadmin dump "local_tmp/repos" | svnadmin load 
> "repositories/externals_tests-6" <TIME = 0.003394>
> CMD: svn "co" "--username" "jrandom" "--password" "rayjandom" 
> "svn://localhost/repositories/externals_tests-6" 
> "working_copies/externals_tests-6" <TIME = 0.001826>
> CMD: svn "checkout" "--username" "jrandom" "--password" "rayjandom" 
> "svn://localhost/repositories/externals_tests-6" 
> "working_copies/externals_tests-6.init" <TIME = 0.001732>
> CMD: svn "ci" "-m" "log msg" "--quiet" 
> "working_copies/externals_tests-6.init" <TIME = 0.001788>
> CMD: svn "ci" "-m" "log msg" "--quiet" "working_copies/ext

You need to run the test manually to avoid truncating the buffered output.

$ cd subversion/tests/clients/cmdline
$ ./externals_tests.py 6 BASE_URL=svn://localhost

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org