这篇文章上次修改于 903 天前,可能其部分内容已经发生变化,如有疑问可询问作者。

1 cannot access memory

1.1 问题复现

运行如下命令后:

$ bazel build common/pressurer:pressurer_main
$ ./bazel-bin/common/pressurer/pressurer_main  --pressure_config  /autocar/common/pressurer/config/test-car.pb.txt --headless

报错:

*** SIGSEGV received at time=1623125966 ***
PC: @     0x7f12747d4224  (unknown)  (unknown)
    @          0x1457196         80  absl::lts_2019_08_08::AbslFailureSignalHandler()
    @     0x7f12e2294390  1292615824  (unknown)
    @          0x14c2754         96  _ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_PN5walle17CommandController14CommandHandlerEESt10_Select1stISC_ESt4lessIS5_ESaISC_EE17_M_emplace_uniqueIIRS7_RSB_EEES6_ISt17_Rb_tree_iteratorISC_EbEDpOT_
    @          0x14c28dd         32  walle::CommandController::AddHandler()
    @           0x4aa436        192  walle::pressure::MockDirectOnboardApp::Initialize()
    @           0x450792       1296  main
    @     0x7f12746a6840  (unknown)  __libc_start_main
    @  0x22e258d4c544155  (unknown)  (unknown)
段错误 (核心已转储)

1.2 查看堆栈

$ mkdir -p /autocar/data/core
$ echo  "/autocar/data/core/core_%e.%p" | sudo tee /proc/sys/kernel/core_pattern
$ ./bazel-bin/common/pressurer/pressurer_main  --pressure_config  /autocar/common/pressurer/config/test-car.pb.txt
$ gdb bazel-bin/common/pressurer/pressurer_main /autocar/data/core/core_pressurer_main.11922

堆栈输出:#5~#8 提示内存不可被访问,#19~#21 提示了出错的地方

(gdb) bt
#0  0x00007f6579694269 in raise (sig=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x0000000001457200 in absl::lts_2019_08_08::RaiseToDefaultHandler (signo=11) at external/com_google_absl/absl/debugging/failure_signal_handler.cc:58
#2  absl::lts_2019_08_08::AbslFailureSignalHandler (signo=11, ucontext=0x7f659333f9c0) at external/com_google_absl/absl/debugging/failure_signal_handler.cc:322
#3  <signal handler called>
#4  __memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:244
#5  0x00000000014c2579 in std::char_traits<char>::copy (__n=56820592, __s2=0x31 <error: Cannot access memory at address 0x31>, __s1=<optimized out>)
    at /usr/include/c++/5/bits/char_traits.h:290
#6  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__n=56820592, __s=0x31 <error: Cannot access memory at address 0x31>,
    __d=<optimized out>) at /usr/include/c++/5/bits/basic_string.h:299
#7  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy_chars (__k2=<optimized out>,
    __k1=0x31 <error: Cannot access memory at address 0x31>, __p=<optimized out>) at /usr/include/c++/5/bits/basic_string.h:341
#8  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*> (this=this@entry=0x34ea010,
    __beg=0x31 <error: Cannot access memory at address 0x31>, __end=<optimized out>) at /usr/include/c++/5/bits/basic_string.tcc:229
#9  0x00000000014c2754 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char*> (__end=<optimized out>,
    __beg=<optimized out>, this=0x34ea010) at /usr/include/c++/5/bits/basic_string.h:195
#10 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*> (__end=<optimized out>, __beg=<optimized out>, this=0x34ea010)
    at /usr/include/c++/5/bits/basic_string.h:214
#11 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (__str=..., this=0x34ea010)
    at /usr/include/c++/5/bits/basic_string.h:400
#12 std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*>::pair<walle::CommandController::CommandHandler*&, void> (__y=<optimized out>, __x=..., this=<optimized out>) at /usr/include/c++/5/bits/stl_pair.h:139
#13 __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> > >::construct<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, walle::CommandController::CommandHandler*&> (__p=<optimized out>,
    this=<optimized out>) at /usr/include/c++/5/ext/new_allocator.h:120
#14 std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> > > >::construct<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, walle::CommandController::CommandHandler*&> (
    __p=<optimized out>, __a=...) at /usr/include/c++/5/bits/alloc_traits.h:530
#15 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> > >::_M_construct_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, walle::CommandController::CommandHandler*&> (__node=0x34e9ff0, this=0x34baaf8)
    at /usr/include/c++/5/bits/stl_tree.h:529
#16 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> > >::_M_create_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, walle::CommandController::CommandHandler*&> (this=0x34baaf8)
    at /usr/include/c++/5/bits/stl_tree.h:546
---Type <return> to continue, or q <return> to quit---
#17 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> > >::_M_emplace_unique<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, walle::CommandController::CommandHandler*&> (this=0x34baaf8)
    at /usr/include/c++/5/bits/stl_tree.h:2123
#18 0x00000000014c28dd in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, walle::CommandController::CommandHandler*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, walle::CommandController::CommandHandler*> > >::emplace<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, walle::CommandController::CommandHandler*&> (this=<optimized out>) at /usr/include/c++/5/bits/stl_map.h:559
#19 walle::CommandController::AddHandler (this=<optimized out>, handler=0x3630350) at common/framework/command_controller.cc:62
#20 0x00000000004aa436 in walle::pressure::MockDirectOnboardApp::Initialize (this=this@entry=0x21742e0 <main::onboard_app>, onboard_options=...)
    at common/pressurer/mock_direct_onboard_app.cc:61
#21 0x0000000000450792 in main (argc=4, argv=<optimized out>) at common/pressurer/pressurer_main.cc:52

1.3 查看代码

报错的地方:command_controller.cc: AddHandler()

void CommandController::AddHandler(CommandHandler* handler) {
  ...
  handler_.emplace(handler->name(), handler);
}

如果将 handler_.emplace(handler->name(), handler) 改为 handler_.emplace("handler->name()", handler),将不会报错,可见 handler->name() 是不可被访问的。

调用 AddHandler() 的地方:mock_direct_onboard_app.cc: Initialize()

bool MockDirectOnboardApp::Initialize(const LocalOnboardContext::Options& onboard_options) {
  ...
  onboard_context_->command_controller()->AddHandler(command_observer_);
  ...
}

command_observer_ 被赋值的地方:mock_direct_onboard_app.cc: MockDirectOnboardApp(),可见 command_observer_ 指向了被释放的内存

MockDirectOnboardApp::MockDirectOnboardApp(PressurerContext* pressure_context)
    : pressure_context_(pressure_context) {
  auto command_observer = std::make_unique<AppCommandHandler>(this);  // 出了本函数作用域后会被释放
  command_observer_ = command_observer.get();  // 出了本函数后,会指向被释放的内存
}

修改:mock_direct_onboard_app.cc

MockDirectOnboardApp::MockDirectOnboardApp(PressurerContext* pressure_context)
    : pressure_context_(pressure_context) {
  command_observer_ = std::make_unique<AppCommandHandler>(this);
}

修改:mock_direct_onboard_app.h

AppCommandHandler* command_observer_ = nullptr;  -->  std::unique_ptr<AppCommandHandler> command_observer_ = nullptr;

修复:https://dev.sankuai.com/code/repo-detail/walle/autocar/pr/12184/overview

2 bad_alloc

以下为不确定的结论。

同时,偶尔会有如下报错

terminate called after throwing an instance of 'std::bad_alloc'
what():  std::bad_alloc
*** SIGABRT received at time=1623055268 ***
PC: @     0x7f22d0e45438  (unknown)  raise
  @          0x1451cc6         80  absl::lts_2019_08_08::AbslFailureSignalHandler()
  @     0x7f233ea1e390  (unknown)  (unknown)
  @ ... and at least 2 more frames
已放弃 (核心已转储)

原因:command_controller.cc: AddHandler() 中,handler_ 是 map,从之前堆栈 #18 看出 map.emplace 会有分配内存操作。推测因为 handler 地址非法,导致 map.emplace 报错。

void CommandController::AddHandler(CommandHandler* handler) {
  ...
  handler_.emplace(handler->name(), handler);
}