You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2021/08/26 22:22:00 UTC

[jira] [Comment Edited] (IMPALA-10342) Flooding of UDF warnings crash the coordinator

    [ https://issues.apache.org/jira/browse/IMPALA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405489#comment-17405489 ] 

Wenzhe Zhou edited comment on IMPALA-10342 at 8/26/21, 10:21 PM:
-----------------------------------------------------------------

Saw same issue in one customer case. In that case, warning messages came from two lines of code in Frontend flooded the impala.long.WARNING files and caused coordinator crashing when generating runtime profile. The calling stacks are same as the stacks reported in this Jira.

Could we add rate limit to suppress warning messages? We cannot turn off warning messages, but adding rate limit could avoid flooding by warning messages came from a few places.


was (Author: wzhou):
Saw same issue in one customer case. In that case, two warning messages from frontend flooded the impala.long.WARNING files and caused coordinator crashing when generating runtime profile. The calling stacks are same as the stacks reported in this Jira.

Could we add rate limit to suppress warning messages?

> Flooding of UDF warnings crash the coordinator
> ----------------------------------------------
>
>                 Key: IMPALA-10342
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10342
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Fifteen
>            Assignee: Fifteen
>            Priority: Minor
>         Attachments: image-2020-11-19-17-30-22-918.png, image-2020-11-23-09-57-49-840.png, image-2021-04-28-20-20-45-798.png, impalad-ram-profile.pdf
>
>
> Hi, when encounting error, both `get_json_object()` and `DecimalOperators::IntToDecimalVal` will raise warning.
> During to their stateless nature, The warning flood will easily overwhelm cluster's processing capacity.
> To be specific, we have observed these bottlenecks:
> *Exchange Receiver*:   the default value for `rpc_max_message_size` is 50MB. The flooding warning messages carried by ReportExecStatusPB may exceed that limit, causing profile-less status report. Or,  if the report message size is somehow under the limit, the bandwidth consumption is also non-trivial.
> *Storage:* like IMPALA-5256 , flooding warnings produce huge log files since `stdout/stderr` won't be redirected when glog is rolling logs.  Under this circumstance, we had enough of clearing log files and restarting executors. 
> *Coordinator*: runtime profiles will be serialized to thrift and stored in Coordinator's memory. The warning flood will make `Untracked Memory` rising rapidly. I have made a heap profile(with pprof) and found most memory were used by RuntimeProfile and Strings. 
>   !image-2020-11-23-09-57-49-840.png!
>  
> *1 preliminary Solution:*
> We suffered a lot from this problem, and we have came out with an preliminary solution. 
>  # We have a straightforward solution by muting the AddWarning()
>  # Introduced a query option to re-enable the warning when needed.
>  *Testing:*
> With muted warning messages, we find the burden of C nodes is highly alleviated and heap profiles no longer bound to RuntimeProfile.
>  
> *Update*
> Encountered a similar crash case with  `get_json_object()` query, each time the query submitted, the Coordinator crashes.
> !image-2021-04-28-20-20-45-798.png!
> Log:
> {code:java}
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0000000002c64dca, pid=3633220, tid=0x00007eff73308700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 )
> # Problematic frame:
> # C  [impalad+0x2864dca]  tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x13a
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /run/cloudera-scm-agent/process/10376-impala-IMPALAD/hs_err_pid3633220.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> d. The connection had 2 associated session(s).
> I0427 13:43:03.907536 3853145 status.cc:126] Couldn't serialize thrift object:
> std::bad_alloc
>     @           0xbf4ef9
>     @          0x1352d5f
>     @          0x1352eaf
>     @          0x11986de
>     @          0x122516c
>     @          0x1225515
>     @          0x137ee36
>     @          0x13801a0
>     @          0x139682f
>     @          0x139915a
>     @          0x1399784
>     @     0x7f34791e0e24
>     @     0x7f3475dd835c
> {code}
>  StackTrace:
> {code:java}
> Stack: [0x00007eff72b08000,0x00007eff73309000],  sp=0x00007eff733006b0,  free space=8161k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> C  [impalad+0x2864dca]  tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x13a
> C  [impalad+0x286519f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x29a211a]  operator delete(void*)+0x32a
> C  [impalad+0xae94d9]  impala::TRuntimeProfileNode::~TRuntimeProfileNode()+0x289
> C  [impalad+0xae4987]  impala::TRuntimeProfileTree::~TRuntimeProfileTree()+0x47
> C  [impalad+0xf5280a]  impala::RuntimeProfile::Compress(std::vector<unsigned char, std::allocator<unsigned char> >*) const+0x3aa
> C  [impalad+0xf52eb0]  impala::RuntimeProfile::SerializeToArchiveString(std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*) const+0x40
> C  [impalad+0xd986df]  impala::ImpalaServer::GetRuntimeProfileOutput(impala::TUniqueId const&, std::string const&, impala::TRuntimeProfileFormat::type, std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, impala::TRuntimeProfileTree*, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*)+0x5bf
> C  [impalad+0xe2516d]  impala::ImpalaHttpHandler::QueryProfileHelper(kudu::WebCallbackRegistry::WebRequest const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*, impala::TRuntimeProfileFormat::type)+0x4ed
> C  [impalad+0xe25516]  impala::ImpalaHttpHandler::QueryProfileEncodedHandler(kudu::WebCallbackRegistry::WebRequest const&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>*)+0x16
> C  [impalad+0xf7ee37]  impala::Webserver::RenderUrlWithTemplate(sq_connection const*, kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler const&, std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, impala::ContentType*)+0x177
> C  [impalad+0xf801a1]  impala::Webserver::BeginRequestCallback(sq_connection*, sq_request_info*)+0x951
> C  [impalad+0xf96830]  kudu::StringGauge::~StringGauge()+0x100
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org