You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/08/08 19:08:00 UTC

[jira] [Updated] (IMPALA-10063) Intermittent crash seen during ComputeCpuRatios

     [ https://issues.apache.org/jira/browse/IMPALA-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha updated IMPALA-10063:
--------------------------------
    Description: 
On my desktop running Ubuntu 18.04 I sometimes  (once in a few days) see impalad hit a DCHECK and crash in ComputeCpuRatios with the following stack:
{noformat}
system-state-info.cc:146] Check failed: total_tics > 0 (-9802454 vs. 0) 
*** Check failure stack trace: ***
    @          0x5140a1c  google::LogMessage::Fail()
    @          0x514230c  google::LogMessage::SendToLog()
    @          0x514037a  google::LogMessage::Flush()
    @          0x5143f78  google::LogMessageFatal::~LogMessageFatal()
    @          0x266a4f7  impala::SystemStateInfo::ComputeCpuRatios()
    @          0x2669c89  impala::SystemStateInfo::CaptureSystemStateSnapshot()
    @          0x2193dae  _ZZN6impala7ExecEnv19InitSystemStateInfoEvENKUlvE_clEv
    @          0x2194ab0  _ZNSt17_Function_handlerIFvvEZN6impala7ExecEnv19InitSystemStateInfoEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x26244af  std::function<>::operator()()
    @          0x2622a97  impala::PeriodicCounterUpdater::UpdateLoop()
    @          0x262dc10  boost::_mfi::mf0<>::operator()()
    @          0x262db72  boost::_bi::list1<>::operator()<>()
    @          0x262db1a  boost::_bi::bind_t<>::operator()()
    @          0x262dadb  boost::detail::thread_data<>::run()
    @          0x3e47771  thread_proxy
    @     0x7f0a9bc6d6da  start_thread
    @     0x7f0a986a5a3e  clone
{noformat}

Since the total_tics calculation is dependent on the system clock, on further digging using the adjtimex utility, I found some odd  values for the offset and frequency on the machine .. they are negative instead of positive:   [UPDATE: actually based on adjtimex man page, the negative value is acceptable, so it is not clear if this could be a contributing factor or not]
{noformat}
$ adjtimex -p
         mode: 0
       offset: -1232794                       
    frequency: -109214                      
     maxerror: 236000
     esterror: 0
       status: 8193
time_constant: 5
    precision: 1
    tolerance: 32768000
         tick: 10000
     raw time:  1596910192s 365781168us = 1596910192.365781168
{noformat}

Regardless of this, we should prevent it from crashing impala. 


  was:
On my desktop running Ubuntu 18.04 I sometimes  (once in a few days) see impalad hit a DCHECK and crash in ComputeCpuRatios with the following stack:
{noformat}
system-state-info.cc:146] Check failed: total_tics > 0 (-9802454 vs. 0) 
*** Check failure stack trace: ***
    @          0x5140a1c  google::LogMessage::Fail()
    @          0x514230c  google::LogMessage::SendToLog()
    @          0x514037a  google::LogMessage::Flush()
    @          0x5143f78  google::LogMessageFatal::~LogMessageFatal()
    @          0x266a4f7  impala::SystemStateInfo::ComputeCpuRatios()
    @          0x2669c89  impala::SystemStateInfo::CaptureSystemStateSnapshot()
    @          0x2193dae  _ZZN6impala7ExecEnv19InitSystemStateInfoEvENKUlvE_clEv
    @          0x2194ab0  _ZNSt17_Function_handlerIFvvEZN6impala7ExecEnv19InitSystemStateInfoEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x26244af  std::function<>::operator()()
    @          0x2622a97  impala::PeriodicCounterUpdater::UpdateLoop()
    @          0x262dc10  boost::_mfi::mf0<>::operator()()
    @          0x262db72  boost::_bi::list1<>::operator()<>()
    @          0x262db1a  boost::_bi::bind_t<>::operator()()
    @          0x262dadb  boost::detail::thread_data<>::run()
    @          0x3e47771  thread_proxy
    @     0x7f0a9bc6d6da  start_thread
    @     0x7f0a986a5a3e  clone
{noformat}

Since the total_tics calculation is dependent on the system clock, on further digging using the adjtimex utility, I found some odd  values for the offset and frequency on the machine .. they are negative instead of positive: 
{noformat}
$ adjtimex -p
         mode: 0
       offset: -1232794                        <--- seems wrong
    frequency: -109214                      <--- seems wrong
     maxerror: 236000
     esterror: 0
       status: 8193
time_constant: 5
    precision: 1
    tolerance: 32768000
         tick: 10000
     raw time:  1596910192s 365781168us = 1596910192.365781168
{noformat}

Regardless of this, we should prevent it from crashing impala. 



> Intermittent crash seen during ComputeCpuRatios
> -----------------------------------------------
>
>                 Key: IMPALA-10063
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10063
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> On my desktop running Ubuntu 18.04 I sometimes  (once in a few days) see impalad hit a DCHECK and crash in ComputeCpuRatios with the following stack:
> {noformat}
> system-state-info.cc:146] Check failed: total_tics > 0 (-9802454 vs. 0) 
> *** Check failure stack trace: ***
>     @          0x5140a1c  google::LogMessage::Fail()
>     @          0x514230c  google::LogMessage::SendToLog()
>     @          0x514037a  google::LogMessage::Flush()
>     @          0x5143f78  google::LogMessageFatal::~LogMessageFatal()
>     @          0x266a4f7  impala::SystemStateInfo::ComputeCpuRatios()
>     @          0x2669c89  impala::SystemStateInfo::CaptureSystemStateSnapshot()
>     @          0x2193dae  _ZZN6impala7ExecEnv19InitSystemStateInfoEvENKUlvE_clEv
>     @          0x2194ab0  _ZNSt17_Function_handlerIFvvEZN6impala7ExecEnv19InitSystemStateInfoEvEUlvE_E9_M_invokeERKSt9_Any_data
>     @          0x26244af  std::function<>::operator()()
>     @          0x2622a97  impala::PeriodicCounterUpdater::UpdateLoop()
>     @          0x262dc10  boost::_mfi::mf0<>::operator()()
>     @          0x262db72  boost::_bi::list1<>::operator()<>()
>     @          0x262db1a  boost::_bi::bind_t<>::operator()()
>     @          0x262dadb  boost::detail::thread_data<>::run()
>     @          0x3e47771  thread_proxy
>     @     0x7f0a9bc6d6da  start_thread
>     @     0x7f0a986a5a3e  clone
> {noformat}
> Since the total_tics calculation is dependent on the system clock, on further digging using the adjtimex utility, I found some odd  values for the offset and frequency on the machine .. they are negative instead of positive:   [UPDATE: actually based on adjtimex man page, the negative value is acceptable, so it is not clear if this could be a contributing factor or not]
> {noformat}
> $ adjtimex -p
>          mode: 0
>        offset: -1232794                       
>     frequency: -109214                      
>      maxerror: 236000
>      esterror: 0
>        status: 8193
> time_constant: 5
>     precision: 1
>     tolerance: 32768000
>          tick: 10000
>      raw time:  1596910192s 365781168us = 1596910192.365781168
> {noformat}
> Regardless of this, we should prevent it from crashing impala. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org