You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Blake Bender (Jira)" <ji...@apache.org> on 2020/08/21 15:18:00 UTC
[jira] [Comment Edited] (GEODE-8436) Several threads calling
PdxInstanceFactory::create() causes seg fault
[ https://issues.apache.org/jira/browse/GEODE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181476#comment-17181476 ]
Blake Bender edited comment on GEODE-8436 at 8/21/20, 3:17 PM:
---------------------------------------------------------------
[~alberto.bustamante.reyes] this is causing a failure in the test
{code:java}
testThinClientPoolExecuteHAFunction {code}
on RedHat (RHEL7 & RHEL8 both fail). Per our policy I've reverted the change while we investigate. If you have access to a RHEL machine, you're welcome to try and track things down. I will investigate here as time permits. What I see consistently in our output logs is this:
{quote}[error 2020/08/20 17:09:00.313552 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Execute: An exception (org.apache.geode.cache.execute.FunctionException: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115)
at org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53)
at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88)
at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406)
at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201)
at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421)
at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401)
at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433)
... 1 more
) happened at remote server.
[info 2020/08/20 17:09:00.314091 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close connection message failed with msg: TcrConnection::send: connection failure
[info 2020/08/20 17:09:00.314303 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Removing bucketServerLocation [heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote}
was (Author: bbender):
[~alberto.bustamante.reyes] this is causing a failure in the test `testThinClientPoolExecuteHAFunction` on RedHat (RHEL7 & RHEL8 both fail). Per our policy I've reverted the change while we investigate. If you have access to a RHEL machine, you're welcome to try and track things down. I will investigate here as time permits. What I see consistently in our output logs is this:
{quote}[error 2020/08/20 17:09:00.313552 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Execute: An exception (org.apache.geode.cache.execute.FunctionException: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115)
at org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53)
at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88)
at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406)
at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201)
at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false
at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421)
at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401)
at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108)
at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433)
... 1 more
) happened at remote server.
[info 2020/08/20 17:09:00.314091 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close connection message failed with msg: TcrConnection::send: connection failure
[info 2020/08/20 17:09:00.314303 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Removing bucketServerLocation [heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote}
> Several threads calling PdxInstanceFactory::create() causes seg fault
> ---------------------------------------------------------------------
>
> Key: GEODE-8436
> URL: https://issues.apache.org/jira/browse/GEODE-8436
> Project: Geode
> Issue Type: Bug
> Components: native client
> Reporter: Alberto Bustamante Reyes
> Assignee: Alberto Bustamante Reyes
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: main.cpp
>
>
> I have seen a problem when "PdxInstanceFactory::create()" is called by several threads that are registering the same new pdx type.
> The core is produced here:
> {code}
> void PdxInstanceImpl::toDataMutable(PdxWriter& writer) {
> auto pt = getPdxType();
> std::vector<std::shared_ptr<PdxFieldType>>* pdxFieldList =
> pt->getPdxFieldTypes();
> {code}
> The problem is that "getPdxType()" returns nullptr, so in the next line, there is segmentation fault when calling "pt->getPdxFieldTypes()".
> The issue can be reproduced using the attached client, and executing it using 8 threads. This is the stack got in gdb:
> {code}
> #0 apache::geode::client::PdxType::getPdxFieldTypes (this=0x0) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxType.hpp:178
> #1 0x00007f43dc4651b7 in apache::geode::client::PdxInstanceImpl::toDataMutable (this=0x7f43c0001600, writer=...) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1336
> #2 0x00007f43dc4650fd in apache::geode::client::PdxInstanceImpl::toData (this=0x7f43c0001600, writer=...) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1327
> #3 0x00007f43dc444971 in apache::geode::client::PdxHelper::serializePdx (output=..., pdxObject=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, std::allocator<apache::geode::client::PdxInstanceImpl>, (__gnu_cxx::_Lock_policy)2>'
> warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, std::allocator<apache::geode::client::PdxInstanceImpl>, (__gnu_cxx::_Lock_policy)2>'
> std::shared_ptr<apache::geode::client::PdxSerializable> (use count 3, weak count 0) = {...})
> at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxHelper.cpp:77
> #4 0x00007f43dc44b4bc in apache::geode::client::PdxInstanceFactory::create (this=0x7f43c7ffecc8) at /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceFactory.cpp:53
> #5 0x000000000040de2f in doPut () at /home/alb3rtobr/CLionProjects/dummy-client/main.cpp:60
> #6 0x0000000000427767 in std::__invoke_impl<void, void (*)()> (__f=@0x2561aa8: 0x40d860 <doPut()>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:60
> #7 0x00000000004276fd in std::__invoke<void (*)()> (__fn=@0x2561aa8: 0x40d860 <doPut()>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:95
> #8 0x00000000004276d5 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x2561aa8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:234
> #9 0x00000000004276a5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x2561aa8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:243
> #10 0x0000000000427589 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x2561aa0)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)