You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Blake Bender (Jira)" <ji...@apache.org> on 2020/08/21 15:18:00 UTC

[jira] [Comment Edited] (GEODE-8436) Several threads calling PdxInstanceFactory::create() causes seg fault

    [ https://issues.apache.org/jira/browse/GEODE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181476#comment-17181476 ] 

Blake Bender edited comment on GEODE-8436 at 8/21/20, 3:17 PM:
---------------------------------------------------------------

[~alberto.bustamante.reyes] this is causing a failure in the test
{code:java}
testThinClientPoolExecuteHAFunction {code}
on RedHat (RHEL7 & RHEL8 both fail).  Per our policy I've reverted the change while we investigate.  If you have access to a RHEL machine, you're welcome to try and track things down.  I will investigate here as time permits.  What I see consistently in our output logs is this:
{quote}[error 2020/08/20 17:09:00.313552 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Execute: An exception (org.apache.geode.cache.execute.FunctionException: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
 at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115)
 at org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53)
 at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88)
 at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406)
 at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201)
 at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
 at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
 at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
 at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
 at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
 at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375)
 at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502)
 at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432)
 at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421)
 at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401)
 at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108)
 at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433)
 ... 1 more
 ) happened at remote server.
 [info 2020/08/20 17:09:00.314091 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close connection message failed with msg: TcrConnection::send: connection failure
 [info 2020/08/20 17:09:00.314303 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Removing bucketServerLocation [heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote}


was (Author: bbender):
[~alberto.bustamante.reyes] this is causing a failure in the test `testThinClientPoolExecuteHAFunction` on RedHat (RHEL7 & RHEL8 both fail).  Per our policy I've reverted the change while we investigate.  If you have access to a RHEL machine, you're welcome to try and track things down.  I will investigate here as time permits.  What I see consistently in our output logs is this:
{quote}[error 2020/08/20 17:09:00.313552 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Execute: An exception (org.apache.geode.cache.execute.FunctionException: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
	at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115)
	at org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53)
	at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88)
	at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406)
	at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201)
	at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
	at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
	at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
	at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
	at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
	at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375)
	at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502)
	at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432)
	at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421)
	at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401)
	at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108)
	at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433)
	... 1 more
) happened at remote server.
[info 2020/08/20 17:09:00.314091 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close connection message failed with msg: TcrConnection::send: connection failure
[info 2020/08/20 17:09:00.314303 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Removing bucketServerLocation [heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote}

> Several threads calling PdxInstanceFactory::create() causes seg fault
> ---------------------------------------------------------------------
>
>                 Key: GEODE-8436
>                 URL: https://issues.apache.org/jira/browse/GEODE-8436
>             Project: Geode
>          Issue Type: Bug
>          Components: native client
>            Reporter: Alberto Bustamante Reyes
>            Assignee: Alberto Bustamante Reyes
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>         Attachments: main.cpp
>
>
> I have seen a problem when "PdxInstanceFactory::create()" is called by several threads that are registering the same new pdx type.
> The core is produced here:
> {code}
> void PdxInstanceImpl::toDataMutable(PdxWriter& writer) {
>    auto pt = getPdxType();
>    std::vector<std::shared_ptr<PdxFieldType>>* pdxFieldList =
>        pt->getPdxFieldTypes();
> {code}
> The problem is that "getPdxType()" returns nullptr, so in the next line, there is segmentation fault when calling "pt->getPdxFieldTypes()".
> The issue can be reproduced using the attached client, and executing it using 8 threads. This is the stack got in gdb:
> {code}
> #0  apache::geode::client::PdxType::getPdxFieldTypes (this=0x0) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxType.hpp:178
> #1  0x00007f43dc4651b7 in apache::geode::client::PdxInstanceImpl::toDataMutable (this=0x7f43c0001600, writer=...) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1336
> #2  0x00007f43dc4650fd in apache::geode::client::PdxInstanceImpl::toData (this=0x7f43c0001600, writer=...) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1327
> #3  0x00007f43dc444971 in apache::geode::client::PdxHelper::serializePdx (output=..., pdxObject=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, std::allocator<apache::geode::client::PdxInstanceImpl>, (__gnu_cxx::_Lock_policy)2>'
> warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, std::allocator<apache::geode::client::PdxInstanceImpl>, (__gnu_cxx::_Lock_policy)2>'
> std::shared_ptr<apache::geode::client::PdxSerializable> (use count 3, weak count 0) = {...})
>     at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxHelper.cpp:77
> #4  0x00007f43dc44b4bc in apache::geode::client::PdxInstanceFactory::create (this=0x7f43c7ffecc8) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceFactory.cpp:53
> #5  0x000000000040de2f in doPut () at /home/alb3rtobr/CLionProjects/dummy-client/main.cpp:60
> #6  0x0000000000427767 in std::__invoke_impl<void, void (*)()> (__f=@0x2561aa8: 0x40d860 <doPut()>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:60
> #7  0x00000000004276fd in std::__invoke<void (*)()> (__fn=@0x2561aa8: 0x40d860 <doPut()>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:95
> #8  0x00000000004276d5 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x2561aa8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:234
> #9  0x00000000004276a5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x2561aa8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:243
> #10 0x0000000000427589 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x2561aa0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)