You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/10 07:14:50 UTC
[GitHub] [pulsar] lhotari opened a new issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
lhotari opened a new issue #11635:
URL: https://github.com/apache/pulsar/issues/11635
**Describe the bug**
The logs contained this line
```
............../run-unit-tests.sh: line 85: 2489 Segmentation fault (core dumped) python pulsar_test.py
.
.
.
Error: Process completed with exit code 139.
```
this happened in https://github.com/apache/pulsar/pull/11387/checks?check_run_id=3297639124
**To Reproduce**
Unknown.
**Expected behavior**
Tests should not crash with segmentation fault.
**Additional context**
Logs of build is available at https://gist.github.com/lhotari/5241f83ed6c348b3d2c91ce5da33c1d0 . "test-logs.zip" is at https://transfer.sh/1sa6OMj/test-logs.zip .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946797373
I reproduced it in my local env and analyzed the core using `gdb`.
Here is the stack:
```
#0 0x00007f655e5e8d0d in std::__atomic_base<long>::load (__m=std::memory_order_seq_cst, this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:396
#1 std::__atomic_base<long>::operator long (this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:259
#2 0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
#3 0x00007f655e5e6cc4 in boost::asio::io_service::run (this=0x1f11990) at /usr/include/boost/asio/impl/io_service.ipp:59
#4 0x00007f655e5e32d0 in pulsar::ExecutorService::startWorker (this=0x1f11130, io_service=std::shared_ptr (count 2, weak 0) 0x1e8ba00) at /pulsar/pulsar-client-cpp/lib/ExecutorService.cc:37
#5 0x00007f655e5f1680 in std::_Mem_fn_base<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>), true>::operator()<std::shared_ptr<boost::asio::io_service>&, void> (this=0x1c45ff8, __object=0x1f11130)
at /usr/include/c++/5/functional:600
#6 0x00007f655e5f1574 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x1c45ff8, __args=<unknown type in /pulsar/pulsar-client-cpp/python/_pulsar.so, CU 0x60498d, DIE 0x64b00a>) at /usr/include/c++/5/functional:1074
#7 0x00007f655e5f1469 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::operator()<, void>() (this=0x1c45ff8)
at /usr/include/c++/5/functional:1133
#8 0x00007f655e5f142e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::_M_invoke<>(std::_Index_tuple<>)
(this=0x1c45ff8) at /usr/include/c++/5/functional:1531
#9 0x00007f655e5f132e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::operator()() (this=0x1c45ff8)
at /usr/include/c++/5/functional:1520
#10 0x00007f655e5f1194 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()> >::_M_run() (
this=0x1c45fe0) at /usr/include/c++/5/thread:115
#11 0x00007f655e866bd0 in execute_native_thread_routine () from /pulsar/pulsar-client-cpp/python/_pulsar.so
#12 0x00007f655fbab6ba in start_thread (arg=0x7f65558f4700) at pthread_create.c:333
#13 0x00007f655f8e151d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
It should be noted about the frame `#2`:
```
#2 0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
```
We can see the pointer is null from `this=0x0`. While from `#3` we can see the `io_service` object is not null. It looks like somehow the internal `impl_` field of `io_service` is released.
I'll work on refactoring the `ExecutorService` implementation and see whether will happen.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-917544151
It looks like this bug can be reproduced easily in docker container
```
2021-09-12 03:36:21.326 INFO [140646740616960] ConnectionPool:96 | Created connection for fakeServiceUrl
2021-09-12 03:36:21.326 ERROR [140646740616960] ClientConnection:502 | [<none> -> fakeServiceUrl] Invalid Url, unable to parse: system:0 Success
2021-09-12 03:36:21.326 INFO [140646740616960] ClientConnection:1499 | [<none> -> fakeServiceUrl] Connection closed
2021-09-12 03:36:21.326 INFO [140646740616960] ClientConnection:255 | [<none> -> fakeServiceUrl] Destroyed connection
2021-09-12 03:36:21.326 ERROR [140646740616960] ClientImpl:188 | Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/connect-error-topic -- ConnectError
.2021-09-12 03:36:21.327 INFO [140646740616960] ClientConnection:181 | [<none> -> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=1000
2021-09-12 03:36:21.327 INFO [140646740616960] ConnectionPool:96 | Created connection for pulsar://192.0.2.1:1234
Segmentation fault
```
Hope the cause can be figured out soon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-911163162
This might be fixed by #11887 . Closing. We can reopen if this appears in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946962333
> I've rerun the Python tests in docker container for 7 times after applying my patch:
>
> - Tue Oct 19 15:08:20 UTC 2021
> - Tue Oct 19 15:12:35 UTC 2021
> - Tue Oct 19 15:15:04 UTC 2021
> - Tue Oct 19 15:17:37 UTC 2021
> - Tue Oct 19 15:20:42 UTC 2021
> - Tue Oct 19 15:24:13 UTC 2021
> - Tue Oct 19 15:27:04 UTC 2021
>
> No segmentation fault happened. I'll open a PR soon.
Nice! Good work @BewareMyPower . I'm looking forward to your PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] merlimat closed issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
merlimat closed issue #11635:
URL: https://github.com/apache/pulsar/issues/11635
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zbentley commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-909589405
If it's the same segfault I mentioned [here](https://github.com/apache/pulsar/issues/6463#issuecomment-909587000), that may provide a means to more predictably reproduce it.
Might be a totally unrelated issue though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower edited a comment on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946797373
I reproduced it in my local env and analyzed the core using `gdb`.
Here is the most important stack:
```
#0 0x00007f655e5e8d0d in std::__atomic_base<long>::load (__m=std::memory_order_seq_cst, this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:396
#1 std::__atomic_base<long>::operator long (this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:259
#2 0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
#3 0x00007f655e5e6cc4 in boost::asio::io_service::run (this=0x1f11990) at /usr/include/boost/asio/impl/io_service.ipp:59
#4 0x00007f655e5e32d0 in pulsar::ExecutorService::startWorker (this=0x1f11130, io_service=std::shared_ptr (count 2, weak 0) 0x1e8ba00) at /pulsar/pulsar-client-cpp/lib/ExecutorService.cc:37
#5 0x00007f655e5f1680 in std::_Mem_fn_base<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>), true>::operator()<std::shared_ptr<boost::asio::io_service>&, void> (this=0x1c45ff8, __object=0x1f11130)
at /usr/include/c++/5/functional:600
#6 0x00007f655e5f1574 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x1c45ff8, __args=<unknown type in /pulsar/pulsar-client-cpp/python/_pulsar.so, CU 0x60498d, DIE 0x64b00a>) at /usr/include/c++/5/functional:1074
#7 0x00007f655e5f1469 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::operator()<, void>() (this=0x1c45ff8)
at /usr/include/c++/5/functional:1133
#8 0x00007f655e5f142e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::_M_invoke<>(std::_Index_tuple<>)
(this=0x1c45ff8) at /usr/include/c++/5/functional:1531
#9 0x00007f655e5f132e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::operator()() (this=0x1c45ff8)
at /usr/include/c++/5/functional:1520
#10 0x00007f655e5f1194 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()> >::_M_run() (
this=0x1c45fe0) at /usr/include/c++/5/thread:115
#11 0x00007f655e866bd0 in execute_native_thread_routine () from /pulsar/pulsar-client-cpp/python/_pulsar.so
#12 0x00007f655fbab6ba in start_thread (arg=0x7f65558f4700) at pthread_create.c:333
#13 0x00007f655f8e151d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
It should be noted about the frame `#2`:
```
#2 0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
```
We can see the pointer is null from `this=0x0`. While from `#3` we can see the `io_service` object is not null. It looks like somehow the internal `impl_` field of `io_service` is released.
I'll work on refactoring the `ExecutorService` implementation and see whether will happen.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-916690180
Okay, I'll take a look soon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-916685824
The problem remains. Reopening the issue.
Happens quite often. Most recent is https://github.com/apache/pulsar/pull/11994/checks?check_run_id=3564315253 .
Logs at https://transfer.sh/z7Pa5K/logs_347931.zip and https://transfer.sh/lJtmwo/test-logs%20%283%29.zip (contains also broker logs).
@BewareMyPower would you mind checking what the problem is?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946840308
I've rerun the Python tests in docker container for 7 times after applying my patch:
- Tue Oct 19 15:08:20 UTC 2021
- Tue Oct 19 15:12:35 UTC 2021
- Tue Oct 19 15:15:04 UTC 2021
- Tue Oct 19 15:17:37 UTC 2021
- Tue Oct 19 15:20:42 UTC 2021
- Tue Oct 19 15:24:13 UTC 2021
- Tue Oct 19 15:27:04 UTC 2021
No segmentation fault happened. I'll open a PR soon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-897527855
I also found this problem recently. It's flaky but the cause is hard to know currently. Maybe a temporary solution is adding the retry logic for `python pulsar_test.py`? I'm not sure if it's easy to do that. In addition, I found test-logs.zip only contains gtest logs (C++ tests) and broker's logs (unfortunately `immediateFlush` is false by default). Python test logs can only be found in the CI workflow page. Maybe it's better to modify `immediateFlush` from false to true when running C++/Python tests.
From the last few lines of the Python test output
```
[persistent://public/default/partitioned_topic_name_test-partition-0, partitioned_topic_name_test_sub, 0] Closed consumer 0
2021-08-11 06:29:09.454 INFO [139946459391744] ConsumerImpl:930 | [persistent://public/default/partitioned_topic_name_test-partition-1, partitioned_topic_name_test_sub, 1] Closed consumer 1
2021-08-11 06:29:09.454 INFO [139946459391744] ConsumerImpl:930 | [persistent://public/default/partitioned_topic_name_test-partition-2, partitioned_topic_name_test_sub, 2] Closed consumer 2
2021-08-11 06:29:09.455 INFO [139946459391744] ClientConnection:1504 | [127.0.0.1:56176 -> 127.0.0.1:6650] Connection closed
```
and standalone logs
```
06:29:09.444 [Thread-380] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:56176] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/partitioned_topic_name_test-partition-1}, client=/127.0.0.1:56176, producerName=standalone-0-355, producerId=1}
06:29:09.444 [Thread-380] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:56176] persistent://public/default/partitioned_topic_name_test-partition-0 configured with schema false
06:29:09.445 [Thread-380] INFO org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:56176] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/partitioned_topic_name_test-partition-0}, client=/127.0.0.1:56176,
```
we can see the topic is `partitioned_topic_name_test`. See https://github.com/apache/pulsar/blob/36d5738412bb1ed9018178007bf63d9202b675db/pulsar-client-cpp/python/pulsar_test.py#L1113-L1132
But it looks like there's no problem with the code. And usually a rerun could work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari closed issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
lhotari closed issue #11635:
URL: https://github.com/apache/pulsar/issues/11635
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-916932943
I might not have enough time for this issue recently, so I assigned it to me first to avoid forgetting this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zbentley commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump
Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-909589405
If it's the same segfault I mentioned [here](https://github.com/apache/pulsar/issues/6463#issuecomment-909587000), that may provide a means to more predictably reproduce it.
Might be a totally unrelated issue though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org