You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/07/29 12:46:00 UTC

[jira] [Commented] (ARROW-17252) [R] Intermittent valgrind failure

    [ https://issues.apache.org/jira/browse/ARROW-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572957#comment-17572957 ] 

Dewey Dunnington commented on ARROW-17252:
------------------------------------------

Another run that had some other failures, including the {{InputType}} one:

https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30290&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=25107


{noformat}
==5248== 56 bytes in 1 blocks are possibly lost in loss record 171 of 3,993
==5248==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5248==    by 0x10547EE7: allocate (new_allocator.h:121)
==5248==    by 0x10547EE7: allocate (alloc_traits.h:460)
==5248==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5248==    by 0x101AFFBA: allocate (new_allocator.h:121)
==5248==    by 0x101AFFBA: allocate (alloc_traits.h:460)
==5248==    by 0x101AFFBA: _M_allocate (stl_vector.h:346)
==5248==    by 0x101AFFBA: void std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >::_M_realloc_insert<arrow::compute::ExecNode*>(__gnu_cxx::__normal_iterator<arrow::compute::ExecNode**, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> > >, arrow::compute::ExecNode*&&) (vector.tcc:440)
==5248==    by 0x101AABBA: emplace_back<arrow::compute::ExecNode*> (vector.tcc:121)
==5248==    by 0x101AABBA: push_back (stl_vector.h:1204)
==5248==    by 0x101AABBA: arrow::compute::ExecNode::ExecNode(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::shared_ptr<arrow::Schema>, int) (exec_plan.cc:414)
==5248==    by 0x101AAD22: arrow::compute::MapNode::MapNode(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, std::shared_ptr<arrow::Schema>, bool) (exec_plan.cc:476)
==5248==    by 0x101EC290: ProjectNode (project_node.cc:46)
==5248==    by 0x101EC290: EmplaceNode<arrow::compute::(anonymous namespace)::ProjectNode, arrow::compute::ExecPlan*&, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, std::shared_ptr<arrow::Schema>, std::vector<arrow::compute::Expression, std::allocator<arrow::compute::Expression> >, bool const&> (exec_plan.h:60)
==5248==    by 0x101EC290: arrow::compute::(anonymous namespace)::ProjectNode::Make(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&) (project_node.cc:73)
==5248==    by 0xFC20D83: std::_Function_handler<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&), arrow::Result<arrow::compute::ExecNode*> (*)(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >&&, arrow::compute::ExecNodeOptions const&) (invoke.h:60)
==5248==    by 0xFA838DC: std::function<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)>::operator()(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&) const (std_function.h:622)
==5248==    by 0xFA81047: arrow::compute::MakeExecNode(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&, arrow::compute::ExecFactoryRegistry*) (exec_plan.h:438)
==5248==    by 0xFA77BE8: MakeExecNodeOrStop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&) (compute-exec.cpp:53)
==5248==    by 0xFA7ADF2: ExecNode_Project(std::shared_ptr<arrow::compute::ExecNode> const&, std::vector<std::shared_ptr<arrow::compute::Expression>, std::allocator<std::shared_ptr<arrow::compute::Expression> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) (compute-exec.cpp:307)
==5248==    by 0xF9FC997: _arrow_ExecNode_Project (arrowExports.cpp:986)
==5248==    by 0x4953BC4: R_doDotCall (dotcode.c:607)
{noformat}


> [R] Intermittent valgrind failure
> ---------------------------------
>
>                 Key: ARROW-17252
>                 URL: https://issues.apache.org/jira/browse/ARROW-17252
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> A number of recent nightly builds have intermittent failures with valgrind, which fails because of possibly leaked memory around an exec plan. This seems related to a change in XXX that separated {{ExecPlan_prepare()}} from {{ExecPlan_run()}} and added a {{ExecPlan_read_table()}} that uses {{RunWithCapturedR()}}. The reported leaks vary but include ExecPlans and ExecNodes and fields of those objects.
> A failed run: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=30310&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=24980
> Some example output:
> {noformat}
> ==5249== 14,112 (384 direct, 13,728 indirect) bytes in 1 blocks are definitely lost in loss record 1,988 of 3,883
> ==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==    by 0x10B2902B: std::_Function_handler<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&), arrow::compute::internal::RegisterAggregateNode(arrow::compute::ExecFactoryRegistry*)::{lambda(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecPlan*&&, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >&&, arrow::compute::ExecNodeOptions const&) (exec_plan.h:60)
> ==5249==    by 0xFA83A0C: std::function<arrow::Result<arrow::compute::ExecNode*> (arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&)>::operator()(arrow::compute::ExecPlan*, std::vector<arrow::compute::ExecNode*, std::allocator<arrow::compute::ExecNode*> >, arrow::compute::ExecNodeOptions const&) const (std_function.h:622)
> ==5249== 14,528 (160 direct, 14,368 indirect) bytes in 1 blocks are definitely lost in loss record 1,989 of 3,883
> ==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==    by 0x10096CB7: arrow::FutureImpl::Make() (future.cc:187)
> ==5249==    by 0xFCB6F9A: arrow::Future<arrow::internal::Empty>::Make() (future.h:420)
> ==5249==    by 0x101AE927: ExecPlanImpl (exec_plan.cc:50)
> ==5249==    by 0x101AE927: arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
> ==5249==    by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
> ==5249==    by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
> ==5249==    by 0x4953B60: R_doDotCall (dotcode.c:601)
> ==5249==    by 0x49C2C16: bcEval (eval.c:7682)
> ==5249==    by 0x499DB95: Rf_eval (eval.c:748)
> ==5249==    by 0x49A0904: R_execClosure (eval.c:1918)
> ==5249==    by 0x49A05B7: Rf_applyClosure (eval.c:1844)
> ==5249==    by 0x49B2122: bcEval (eval.c:7094)
> ==5249== 
> ==5249== 36,322 (416 direct, 35,906 indirect) bytes in 1 blocks are definitely lost in loss record 2,929 of 3,883
> ==5249==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5249==    by 0x10214F92: arrow::compute::TaskScheduler::Make() (task_util.cc:421)
> ==5249==    by 0x101AEA6C: ExecPlanImpl (exec_plan.cc:50)
> ==5249==    by 0x101AEA6C: arrow::compute::ExecPlan::Make(arrow::compute::ExecContext*, std::shared_ptr<arrow::KeyValueMetadata const>) (exec_plan.cc:355)
> ==5249==    by 0xFA77BA2: ExecPlan_create(bool) (compute-exec.cpp:45)
> ==5249==    by 0xF9FAE9F: _arrow_ExecPlan_create (arrowExports.cpp:868)
> ==5249==    by 0x4953B60: R_doDotCall (dotcode.c:601)
> ==5249==    by 0x49C2C16: bcEval (eval.c:7682)
> ==5249==    by 0x499DB95: Rf_eval (eval.c:748)
> ==5249==    by 0x49A0904: R_execClosure (eval.c:1918)
> ==5249==    by 0x49A05B7: Rf_applyClosure (eval.c:1844)
> ==5249==    by 0x49B2122: bcEval (eval.c:7094)
> ==5249==    by 0x499DB95: Rf_eval (eval.c:748)
> {noformat}
> We also occasionally get leaked Schemas, and in one case a leaked InputType that seemed completely unrelated to the other leaks (ARROW-17225).
> I'm wondering if these have to do with references in lambdas that get passed by reference? Or perhaps a cache issue? There were some instances in previous leaks where the backtrace to the {{new}} allocator was different between reported leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)