You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Zhao Yongming (JIRA)" <ji...@apache.org> on 2013/11/20 09:09:20 UTC
[jira] [Updated] (TS-2373) the cache is not initialized properly
[ https://issues.apache.org/jira/browse/TS-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhao Yongming updated TS-2373:
------------------------------
Labels: T (was: )
> the cache is not initialized properly
> -------------------------------------
>
> Key: TS-2373
> URL: https://issues.apache.org/jira/browse/TS-2373
> Project: Traffic Server
> Issue Type: Bug
> Components: Cache
> Reporter: xiongzongtao
> Assignee: xiongzongtao
> Labels: T
>
> in recent days, there are several ATS servers on line failed to initialize cache system when restart, so, all requests are simply forwarded to origins.
> Note: the storage is configured to use raw devices, like this:
> /dev/raw/raw1 146163105792
> /dev/raw/raw2 146163105792
> /dev/raw/raw3 146163105792
> /dev/raw/raw4 146163105792
> /dev/raw/raw5 146163105792
> volume.config is not configured. so, one vol per raw device
> 1. checking the traffic.out, found that: "cache enabled" line that
> should follow the "traffic server running" line
> 2. echo "show:cache-stats"|traffic_shell shows that all stats are ZEROs
> 3. dump the memory of traffic_server by "gdb gcore" for convenience of debuging
> 4. use gdb to debug:
> a) all vols are set in global gvol except /dev/raw/raw4.
> b) one vol (/dev/raw/raw4) hunged in function handle_recover_from_data
> c) check aio_req, found that, one io event are queued in
> aio_req[4]'s aio_temp_list
> d) check io threads, all are in pthread_cond_wait status.
> here, we can see that, handle_recover_from_data issued an io request
> and queued it in aio_req[4]'s aio_temp_list, but all io threads are
> in pthread_cond_wait status, no one to process this request, so that,
> handle_recover_from_data waiting endless. And cache will not enabled
> until all the vols are inited successfully.
> inspecting the code, found that,
> code snippet1 from aio_thread_main
> 1 if (!INK_ATOMICLIST_EMPTY(my_aio_req->aio_temp_list))
> 2 aio_move(my_aio_req);
> 3 if (!(op = my_aio_req->aio_todo.pop()) && !(op = my_aio_req->http_aio_todo.pop()))
> 4 break;
> code snippet2 from aio_queue_req
> 1. if (!ink_mutex_try_acquire(&req->aio_mutex)) {
> ink_atomiclist_push(&req->aio_temp_list, op);
> 2. } else {
> /* check if any pending requests on the atomic list */
> if (!INK_ATOMICLIST_EMPTY(req->aio_temp_list))
> aio_move(req);
> /* now put the new request */
> aio_insert(op, req);
> ink_cond_signal(&req->aio_cond);
> ink_mutex_release(&req->aio_mutex);
> }
> now suppose aio_temp_list is empty, thread A running code snippet1, thread B
> running code snippet2, and just when thread A is between 1 and 2 in snippet1,
> thread B is executing 1 in snippet2. here, thread B insert io request to
> aio_temp_list, but no notification, and thread A continue to 3, and then 4(break),
> finally fall in pthread_cond_wait. thus,no io thread will process this request
> something about my debuging
> (gdb) p theCache
> $1 = (Cache *) 0xd828cb0
> (gdb) p *theCache
> $2 = {cache_read_done = 1, total_good_nvol = 4, total_nvol = 5, ready = 0, cache_size = 89169900,
> hosttable = 0x0, total_initialized_vol = 4, scheme = 1}
> (gdb) set $i=0
> (gdb) p gvol[$i++]->path
> $3 = 0xd826ca0 "/dev/raw/raw2"
> (gdb) p gvol[$i++]->path
> $4 = 0xd8267a0 "/dev/raw/raw1"
> (gdb) p gvol[$i++]->path
> $5 = 0xd826a00 "/dev/raw/raw5"
> (gdb) p gvol[$i++]->path
> $6 = 0xd8269c0 "/dev/raw/raw3"
> (gdb) p gvol[$i++]->path
> Cannot access memory at address 0x0
> (gdb) p cp_list->head->vols[0]
> $7 = (Vol *) 0xd828d10
> (gdb) set $i=0
> (gdb) p *cp_list->head->vols[4]
> $8 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x713010},
> handler = 0x672070 <Vol::aggWrite(int, void*)>, mutex = {m_ptr = 0xd806ac0},
> link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}}, path = 0xd826a00 "/dev/raw/raw5",
> hash_id = 0xd825db0 "/dev/raw/raw5 40960:17833980", hash_id_md5 = {b = {16306325817679019090,
> 17491422742868856741}}, fd = 38, raw_dir = 0x2aaaf0efb000 "\r025", dir = 0x2aaaf0efd000,
> header = 0x2aaaf0efb000, footer = 0x2aaafbcb9000, segments = 278, buckets = 16382,
> recover_pos = 68007976448, prev_recover_pos = 67994133504, scan_pos = 76968569856, skip = 40960,
> start = 364421120, len = 146095964160, data_blocks = 17789500, hit_evacuate_window = 0,
> io = {<AIOCallback> = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x712d70},
> handler = 0x647640 <AIOCallbackInternal::io_complete(int, void*)>, mutex = {m_ptr = 0xd806ac0},
> link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}}, aiocb = {aio_fildes = 0,
> aio_buf = 0x2aaabc47e000, aio_nbytes = 8388608, aio_offset = 67994133504, aio_reqprio = 0,
> aio_lio_opcode = 1, aio_state = 0, aio__pad = {0}}, action = {_vptr.Action = 0x0,
> continuation = 0xd8bc010, mutex = {m_ptr = 0xd806ac0}, cancelled = 0}, thread = 0x0, then = 0x0,
> aio_result = 8388608}, first = 0x0, aio_req = 0xc6532f0, sleep_time = 0},
> agg = {<DLL<CacheVC,Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> stat_cache_vcs = {<DLL<CacheVC,Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> sync = {<DLL<CacheVC,Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, agg_buffer = 0x2aaaf07be000 "",
> agg_todo_size = 0, agg_buf_pos = 0, trigger = 0x0, open_dir = {<Continuation> = {<force_VFPT_to_top> = {
> _vptr.force_VFPT_to_top = 0x713050}, handler = 0x64c580 <OpenDir::signal_readers(int, Event*)>,
> mutex = {m_ptr = 0xd806ac0}, link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}},
> delayed_readers = {<DLL<CacheVC,Link_CacheVC_opendir_link>> = {head = 0x0}, tail = 0x0}, bucket = {{
> head = 0x0} <repeats 256 times>}}, ram_cache = 0x0, evacuate_size = 8709, evacuate = 0xd8bd290,
> lookaside = {{head = 0x0} <repeats 256 times>}, doc_evacuator = 0x0, init_info = 0x0, disk = 0xc650f90,
> ---Type <return> to continue, or q <return> to quit---q
> cache Quit
> (gdb) p *cp_list->head->vols[3]
> warning: can't find linker symbol for virtual table for `Vol' value
> warning: found `CacheProcessor::start_internal(int)::__func__' instead
> $9 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x713010},
> handler = 0x6460e0 <Vol::handle_recover_from_data(int, void*)>, mutex = {m_ptr = 0xd806b10},
> link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}}, path = 0xd8269e0 "/dev/raw/raw4",
> hash_id = 0xd8266a0 "/dev/raw/raw4 40960:17833980", hash_id_md5 = {b = {9625335342759085551,
> 15230879294086164331}}, fd = 37, raw_dir = 0x2aaae59fc000 "\r025", dir = 0x2aaae59fe000,
> header = 0x2aaae59fc000, footer = 0x2aaaf07ba000, segments = 278, buckets = 16382,
> recover_pos = 89786622464, prev_recover_pos = 89786622464, scan_pos = 0, skip = 40960, start = 364421120,
> len = 146095964160, data_blocks = 17789500, hit_evacuate_window = 0,
> io = {<AIOCallback> = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x712d70},
> handler = 0x647640 <AIOCallbackInternal::io_complete(int, void*)>, mutex = {m_ptr = 0xd806b10},
> link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}}, aiocb = {aio_fildes = 37,
> aio_buf = 0x2aab02ca9000, aio_nbytes = 8388608, aio_offset = 89786622464, aio_reqprio = 0,
> aio_lio_opcode = 1, aio_state = 0, aio__pad = {0}}, action = {_vptr.Action = 0x0,
> continuation = 0xd87a010, mutex = {m_ptr = 0xd806b10}, cancelled = 0}, thread = 0x0, then = 0x0,
> aio_result = 8388608}, first = 0x0, aio_req = 0xc6458b0, sleep_time = 0},
> agg = {<DLL<CacheVC,Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> stat_cache_vcs = {<DLL<CacheVC,Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> sync = {<DLL<CacheVC,Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, agg_buffer = 0x2aaae5382000 "",
> agg_todo_size = 0, agg_buf_pos = 0, trigger = 0x0, open_dir = {<Continuation> = {<force_VFPT_to_top> = {
> _vptr.force_VFPT_to_top = 0x713050}, handler = 0x64c580 <OpenDir::signal_readers(int, Event*)>,
> mutex = {m_ptr = 0xd806b10}, link = {<SLink<Continuation>> = {next = 0x0}, prev = 0x0}},
> delayed_readers = {<DLL<CacheVC,Link_CacheVC_opendir_link>> = {head = 0x0}, tail = 0x0}, bucket = {{
> head = 0x0} <repeats 256 times>}}, ram_cache = 0x0, evacuate_size = 8709, evacuate = 0xd8a2aa0,
> lookaside = {{head = 0x0} <repeats 256 times>}, doc_evacuator = 0x0, init_info = 0xd83b860,
> ---Type <return> to continue, or q <return> to quit---q
> disk = 0xc643550, Quit
> (gdb) p aio_reqs
> $10 = {0x0, 0xc5d75f0, 0xc62f4b0, 0xc63a670, 0xc6458b0, 0xc6532f0, 0x0 <repeats 94 times>}
> (gdb) p aio_reqs[5]
> $11 = (AIO_Reqs *) 0xc6532f0
> (gdb) p *aio_reqs[5]
> $12 = {aio_todo = {<DLL<AIOCallback, Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> http_aio_todo = {<DLL<AIOCallback, Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, aio_temp_list = {
> head = {data = 0}, name = 0x717564 "temp_list", offset = 32}, aio_mutex = {__data = {__lock = 0,
> __count = 0, __owner = 0, __nusers = 8, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
> __size = '\0' <repeats 12 times>, "\b", '\0' <repeats 26 times>, __align = 0}, aio_cond = {__data = {
> __lock = 0, __futex = 58, __total_seq = 33, __wakeup_seq = 25, __woken_seq = 25, __mutex = 0xc653328,
> __nwaiters = 16, __broadcast_seq = 0},
> __size = "\000\000\000\000:\000\000\000!\000\000\000\000\000\000\000\031\000\000\000\000\000\000\000\031\000\000\000\000\000\000\000(3e\f\000\000\000\000\020\000\000\000\000\000\000", __align = 249108103168},
> index = 5, pending = 0, queued = 0, filedes = 38, requests_queued = 0}
> (gdb) p *aio_reqs[4]
> $13 = {aio_todo = {<DLL<AIOCallback, Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> http_aio_todo = {<DLL<AIOCallback, Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, aio_temp_list = {
> head = {data = 226992344}, name = 0x717564 "temp_list", offset = 32}, aio_mutex = {__data = {__lock = 0,
> __count = 0, __owner = 0, __nusers = 8, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
> __size = '\0' <repeats 12 times>, "\b", '\0' <repeats 26 times>, __align = 0}, aio_cond = {__data = {
> __lock = 0, __futex = 50, __total_seq = 29, __wakeup_seq = 21, __woken_seq = 21, __mutex = 0xc6458e8,
> __nwaiters = 16, __broadcast_seq = 0},
> __size = "\000\000\000\0002\000\000\000\035\000\000\000\000\000\000\000\025\000\000\000\000\000\000\000\025\000\000\000\000\000\000\000f\000\000\000\000\020\000\000\000\000\000\000", __align = 214748364800},
> index = 4, pending = 0, queued = 0, filedes = 37, requests_queued = 1}
> (gdb) p/x *aio_reqs[4]
> $14 = {aio_todo = {<DLL<AIOCallback, Continuation::Link_link>> = {head = 0x0}, tail = 0x0},
> http_aio_todo = {<DLL<AIOCallback, Continuation::Link_link>> = {head = 0x0}, tail = 0x0}, aio_temp_list = {
> head = {data = 0xd87a0d8}, name = 0x717564, offset = 0x20}, aio_mutex = {__data = {__lock = 0x0,
> __count = 0x0, __owner = 0x0, __nusers = 0x8, __kind = 0x0, __spins = 0x0, __list = {__prev = 0x0,
> __next = 0x0}}, __size = {0x0 <repeats 12 times>, 0x8, 0x0 <repeats 27 times>}, __align = 0x0},
> aio_cond = {__data = {__lock = 0x0, __futex = 0x32, __total_seq = 0x1d, __wakeup_seq = 0x15,
> __woken_seq = 0x15, __mutex = 0xc6458e8, __nwaiters = 0x10, __broadcast_seq = 0x0}, __size = {0x0, 0x0,
> 0x0, 0x0, 0x32, 0x0, 0x0, 0x0, 0x1d, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x15, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xe8, 0x58, 0x64, 0xc, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, __align = 0x3200000000}, index = 0x4, pending = 0x0, queued = 0x0,
> filedes = 0x25, requests_queued = 0x1}
> (gdb) p &cp_list->head->vols[3]->io
> $15 = (AIOCallbackInternal *) 0xd87a0d8
--
This message was sent by Atlassian JIRA
(v6.1#6144)