You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/10 07:14:50 UTC

[GitHub] [pulsar] lhotari opened a new issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

lhotari opened a new issue #11635:
URL: https://github.com/apache/pulsar/issues/11635


   **Describe the bug**
   
   The logs contained this line
   ```
   ............../run-unit-tests.sh: line 85:  2489 Segmentation fault      (core dumped) python pulsar_test.py
   .
   .
   .
   Error: Process completed with exit code 139.
   ```
   this happened in https://github.com/apache/pulsar/pull/11387/checks?check_run_id=3297639124
   
   **To Reproduce**
   
   Unknown.
   
   **Expected behavior**
   
   Tests should not crash with segmentation fault.
   
   **Additional context**
   
   Logs of build is available at https://gist.github.com/lhotari/5241f83ed6c348b3d2c91ce5da33c1d0 . "test-logs.zip" is at https://transfer.sh/1sa6OMj/test-logs.zip .
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946797373


   I reproduced it in my local env and analyzed the core using `gdb`.
   
   Here is the stack:
   
   ```
   #0  0x00007f655e5e8d0d in std::__atomic_base<long>::load (__m=std::memory_order_seq_cst, this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:396
   #1  std::__atomic_base<long>::operator long (this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:259
   #2  0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
   #3  0x00007f655e5e6cc4 in boost::asio::io_service::run (this=0x1f11990) at /usr/include/boost/asio/impl/io_service.ipp:59
   #4  0x00007f655e5e32d0 in pulsar::ExecutorService::startWorker (this=0x1f11130, io_service=std::shared_ptr (count 2, weak 0) 0x1e8ba00) at /pulsar/pulsar-client-cpp/lib/ExecutorService.cc:37
   #5  0x00007f655e5f1680 in std::_Mem_fn_base<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>), true>::operator()<std::shared_ptr<boost::asio::io_service>&, void> (this=0x1c45ff8, __object=0x1f11130)
       at /usr/include/c++/5/functional:600
   #6  0x00007f655e5f1574 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x1c45ff8, __args=<unknown type in /pulsar/pulsar-client-cpp/python/_pulsar.so, CU 0x60498d, DIE 0x64b00a>) at /usr/include/c++/5/functional:1074
   #7  0x00007f655e5f1469 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::operator()<, void>() (this=0x1c45ff8)
       at /usr/include/c++/5/functional:1133
   #8  0x00007f655e5f142e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::_M_invoke<>(std::_Index_tuple<>)
       (this=0x1c45ff8) at /usr/include/c++/5/functional:1531
   #9  0x00007f655e5f132e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::operator()() (this=0x1c45ff8)
       at /usr/include/c++/5/functional:1520
   #10 0x00007f655e5f1194 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()> >::_M_run() (
       this=0x1c45fe0) at /usr/include/c++/5/thread:115
   #11 0x00007f655e866bd0 in execute_native_thread_routine () from /pulsar/pulsar-client-cpp/python/_pulsar.so
   #12 0x00007f655fbab6ba in start_thread (arg=0x7f65558f4700) at pthread_create.c:333
   #13 0x00007f655f8e151d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
   ```
   
   It should be noted about the frame `#2`:
   
   ```
   #2  0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
   ```
   
   We can see the pointer is null from `this=0x0`. While from `#3` we can see the `io_service` object is not null. It looks like somehow the internal `impl_` field of `io_service` is released.
   
   I'll work on refactoring the `ExecutorService` implementation and see whether will happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-917544151


   It looks like this bug can be reproduced easily in docker container
   
   ```
   2021-09-12 03:36:21.326 INFO  [140646740616960] ConnectionPool:96 | Created connection for fakeServiceUrl
   2021-09-12 03:36:21.326 ERROR [140646740616960] ClientConnection:502 | [<none> -> fakeServiceUrl] Invalid Url, unable to parse: system:0 Success
   2021-09-12 03:36:21.326 INFO  [140646740616960] ClientConnection:1499 | [<none> -> fakeServiceUrl] Connection closed
   2021-09-12 03:36:21.326 INFO  [140646740616960] ClientConnection:255 | [<none> -> fakeServiceUrl] Destroyed connection
   2021-09-12 03:36:21.326 ERROR [140646740616960] ClientImpl:188 | Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/connect-error-topic -- ConnectError
   .2021-09-12 03:36:21.327 INFO  [140646740616960] ClientConnection:181 | [<none> -> pulsar://192.0.2.1:1234] Create ClientConnection, timeout=1000
   2021-09-12 03:36:21.327 INFO  [140646740616960] ConnectionPool:96 | Created connection for pulsar://192.0.2.1:1234
   Segmentation fault
   ```
   
   Hope the cause can be figured out soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-911163162


   This might be fixed by #11887 . Closing. We can reopen if this appears in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946962333


   > I've rerun the Python tests in docker container for 7 times after applying my patch:
   > 
   > - Tue Oct 19 15:08:20 UTC 2021
   > - Tue Oct 19 15:12:35 UTC 2021
   > - Tue Oct 19 15:15:04 UTC 2021
   > - Tue Oct 19 15:17:37 UTC 2021
   > - Tue Oct 19 15:20:42 UTC 2021
   > - Tue Oct 19 15:24:13 UTC 2021
   > - Tue Oct 19 15:27:04 UTC 2021
   > 
   > No segmentation fault happened. I'll open a PR soon.
   
   Nice! Good work @BewareMyPower . I'm looking forward to your PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat closed issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
merlimat closed issue #11635:
URL: https://github.com/apache/pulsar/issues/11635


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zbentley commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-909589405


   If it's the same segfault I mentioned [here](https://github.com/apache/pulsar/issues/6463#issuecomment-909587000), that may provide a means to more predictably reproduce it.
   
   Might be a totally unrelated issue though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower edited a comment on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946797373


   I reproduced it in my local env and analyzed the core using `gdb`.
   
   Here is the most important stack:
   
   ```
   #0  0x00007f655e5e8d0d in std::__atomic_base<long>::load (__m=std::memory_order_seq_cst, this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:396
   #1  std::__atomic_base<long>::operator long (this=0xb8) at /usr/include/c++/5/bits/atomic_base.h:259
   #2  0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
   #3  0x00007f655e5e6cc4 in boost::asio::io_service::run (this=0x1f11990) at /usr/include/boost/asio/impl/io_service.ipp:59
   #4  0x00007f655e5e32d0 in pulsar::ExecutorService::startWorker (this=0x1f11130, io_service=std::shared_ptr (count 2, weak 0) 0x1e8ba00) at /pulsar/pulsar-client-cpp/lib/ExecutorService.cc:37
   #5  0x00007f655e5f1680 in std::_Mem_fn_base<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>), true>::operator()<std::shared_ptr<boost::asio::io_service>&, void> (this=0x1c45ff8, __object=0x1f11130)
       at /usr/include/c++/5/functional:600
   #6  0x00007f655e5f1574 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x1c45ff8, __args=<unknown type in /pulsar/pulsar-client-cpp/python/_pulsar.so, CU 0x60498d, DIE 0x64b00a>) at /usr/include/c++/5/functional:1074
   #7  0x00007f655e5f1469 in std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)>::operator()<, void>() (this=0x1c45ff8)
       at /usr/include/c++/5/functional:1133
   #8  0x00007f655e5f142e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::_M_invoke<>(std::_Index_tuple<>)
       (this=0x1c45ff8) at /usr/include/c++/5/functional:1531
   #9  0x00007f655e5f132e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()>::operator()() (this=0x1c45ff8)
       at /usr/include/c++/5/functional:1520
   #10 0x00007f655e5f1194 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::*)(std::shared_ptr<boost::asio::io_service>)> (pulsar::ExecutorService*, std::shared_ptr<boost::asio::io_service>)> ()> >::_M_run() (
       this=0x1c45fe0) at /usr/include/c++/5/thread:115
   #11 0x00007f655e866bd0 in execute_native_thread_routine () from /pulsar/pulsar-client-cpp/python/_pulsar.so
   #12 0x00007f655fbab6ba in start_thread (arg=0x7f65558f4700) at pthread_create.c:333
   #13 0x00007f655f8e151d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
   ```
   
   It should be noted about the frame `#2`:
   
   ```
   #2  0x00007f655e5e6348 in boost::asio::detail::task_io_service::run (this=0x0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:136
   ```
   
   We can see the pointer is null from `this=0x0`. While from `#3` we can see the `io_service` object is not null. It looks like somehow the internal `impl_` field of `io_service` is released.
   
   I'll work on refactoring the `ExecutorService` implementation and see whether will happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-916690180


   Okay, I'll take a look soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-916685824


   The problem remains. Reopening the issue.
   
   Happens quite often. Most recent is https://github.com/apache/pulsar/pull/11994/checks?check_run_id=3564315253 .
   Logs at https://transfer.sh/z7Pa5K/logs_347931.zip and https://transfer.sh/lJtmwo/test-logs%20%283%29.zip (contains also broker logs).
   
   @BewareMyPower would you mind checking what the problem is?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-946840308


   I've rerun the Python tests in docker container for 7 times after applying my patch:
   
   - Tue Oct 19 15:08:20 UTC 2021
   - Tue Oct 19 15:12:35 UTC 2021
   - Tue Oct 19 15:15:04 UTC 2021
   - Tue Oct 19 15:17:37 UTC 2021
   - Tue Oct 19 15:20:42 UTC 2021
   - Tue Oct 19 15:24:13 UTC 2021
   - Tue Oct 19 15:27:04 UTC 2021
   
   No segmentation fault happened. I'll open a PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-897527855


   I also found this problem recently. It's flaky but the cause is hard to know currently. Maybe a temporary solution is adding the retry logic for `python pulsar_test.py`? I'm not sure if it's easy to do that. In addition, I found test-logs.zip only contains gtest logs (C++ tests) and broker's logs (unfortunately `immediateFlush` is false by default). Python test logs can only be found in the CI workflow page. Maybe it's better to modify `immediateFlush` from false to true when running C++/Python tests.
   
   From the last few lines of the Python test output 
   
   ```
   [persistent://public/default/partitioned_topic_name_test-partition-0, partitioned_topic_name_test_sub, 0] Closed consumer 0
   2021-08-11 06:29:09.454 INFO  [139946459391744] ConsumerImpl:930 | [persistent://public/default/partitioned_topic_name_test-partition-1, partitioned_topic_name_test_sub, 1] Closed consumer 1
   2021-08-11 06:29:09.454 INFO  [139946459391744] ConsumerImpl:930 | [persistent://public/default/partitioned_topic_name_test-partition-2, partitioned_topic_name_test_sub, 2] Closed consumer 2
   2021-08-11 06:29:09.455 INFO  [139946459391744] ClientConnection:1504 | [127.0.0.1:56176 -> 127.0.0.1:6650] Connection closed
   ```
   
   and standalone logs
   
   ```
   06:29:09.444 [Thread-380] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:56176] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/partitioned_topic_name_test-partition-1}, client=/127.0.0.1:56176, producerName=standalone-0-355, producerId=1}
   06:29:09.444 [Thread-380] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:56176] persistent://public/default/partitioned_topic_name_test-partition-0 configured with schema false
   06:29:09.445 [Thread-380] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:56176] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/partitioned_topic_name_test-partition-0}, client=/127.0.0.1:56176,
   ```
   
   we can see the topic is `partitioned_topic_name_test`. See https://github.com/apache/pulsar/blob/36d5738412bb1ed9018178007bf63d9202b675db/pulsar-client-cpp/python/pulsar_test.py#L1113-L1132
   
   But it looks like there's no problem with the code. And usually a rerun could work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari closed issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
lhotari closed issue #11635:
URL: https://github.com/apache/pulsar/issues/11635


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-916932943


   I might not have enough time for this issue recently, so I assigned it to me first to avoid forgetting this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] zbentley commented on issue #11635: [Tests] "python pulsar_test.py" failed with segmentation fault and core dump

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #11635:
URL: https://github.com/apache/pulsar/issues/11635#issuecomment-909589405


   If it's the same segfault I mentioned [here](https://github.com/apache/pulsar/issues/6463#issuecomment-909587000), that may provide a means to more predictably reproduce it.
   
   Might be a totally unrelated issue though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org