死锁问题定位分析

死锁产生

程序的多个线程获取多个锁资源时,有可能发生死锁,比如线程A获取了锁1,线程B获取了锁2,线程A还需要获取锁2,线程B还需要获取锁1,这时双方都在等待对方的锁资源,线程A和线程B互相等待,进程死锁。

即多线程获取锁的顺序不一致

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx1;
std::mutex mtx2;

void TaskA()
{
// 保证线程A先获取锁1
std::lock_guard<std::mutex> lockA(mtx1);
std::cout << "线程A获取锁1" << std::endl;

// 线程A睡眠2s再获取锁2,保证锁2先被线程B获取,模拟死锁问题的发生
std::this_thread::sleep_for(std::chrono::seconds(2));

std::lock_guard<std::mutex> lockB(mtx2);
std::cout << "线程A获取锁2" << std::endl;

std::cout << "线程A释放所有锁资源,结束运行!" << std::endl;
}

void TaskB()
{
// 线程B先睡眠1s保证线程A先获取锁1
std::this_thread::sleep_for(std::chrono::seconds(1));
std::lock_guard<std::mutex> lockB(mtx2);
std::cout << "线程B获取锁2" << std::endl;

// 线程B尝试获取锁1
std::lock_guard<std::mutex> lockA(mtx1);
std::cout << "线程B获取锁1" << std::endl;

std::cout << "线程B释放所有锁资源,结束运行!" << std::endl;
}

int main()
{
std::thread t1(TaskA);
std::thread t2(TaskB);

t1.join();
t2.join();

return 0;
}

终端执行:

1
2
g++ thread.cpp -g -lpthread
./a.out

运行结果:

1
2
线程A获取锁1
线程B获取锁2

死锁定位分析

ps命令查看状态和进程号PID

1
2
3
ubuntu@ubuntu-ThinkCentre-M800t-1N000:~$ ps -aux | grep a.out
ubuntu 390341 0.0 0.0 87964 1580 pts/7 Sl+ 14:35 0:00 ./a.out
ubuntu 390356 0.0 0.0 12116 724 pts/1 S+ 14:37 0:00 grep --color=auto a.out

可以看出a.out进程PID是390341,进程状态 Sl+代码说明多线程程序进入阻塞状态

使用 gdb attach pid 调试a.out进程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ubuntu@ubuntu-ThinkCentre-M800t-1N000:~$ sudo gdb attach 390341
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 390341
[New LWP 390342]
[New LWP 390343]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__pthread_clockjoin_ex (threadid=140066514970368, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>)
at pthread_join_common.c:145
145 pthread_join_common.c: No such file or directory.

使用 info threads 查看线程信息

1
2
3
4
5
6
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f63c6deb740 (LWP 390341) "a.out" __pthread_clockjoin_ex (threadid=140066514970368, thread_return=0x0, clockid=<optimized out>,
abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
2 Thread 0x7f63c6dea700 (LWP 390342) "a.out" __lll_lock_wait (futex=futex@entry=0x55c69ef241a0 <mtx2>, private=0) at lowlevellock.c:52
3 Thread 0x7f63c65e9700 (LWP 390343) "a.out" __lll_lock_wait (futex=futex@entry=0x55c69ef24160 <mtx1>, private=0) at lowlevellock.c:52

使用 thread 2 查看线程2信息

1
2
3
4
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f63c6dea700 (LWP 390342))]
#0 __lll_lock_wait (futex=futex@entry=0x55c69ef241a0 <mtx2>, private=0) at lowlevellock.c:52
52 lowlevellock.c: No such file or directory.

使用 where 查看堆栈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#0  __lll_lock_wait (futex=futex@entry=0x55c69ef241a0 <mtx2>, private=0) at lowlevellock.c:52
#1 0x00007f63c73390a3 in __GI___pthread_mutex_lock (mutex=0x55c69ef241a0 <mtx2>) at ../nptl/pthread_mutex_lock.c:80
#2 0x000055c69ef1f74f in __gthread_mutex_lock (__mutex=0x55c69ef241a0 <mtx2>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3 0x000055c69ef1f8a4 in std::mutex::lock (this=0x55c69ef241a0 <mtx2>) at /usr/include/c++/9/bits/std_mutex.h:100
#4 0x000055c69ef1f940 in std::lock_guard<std::mutex>::lock_guard (this=0x7f63c6de9df0, __m=...) at /usr/include/c++/9/bits/std_mutex.h:159
#5 0x000055c69ef1f3f8 in TaskA () at thread.cpp:17
#6 0x000055c69ef20526 in std::__invoke_impl<void, void (*)()> (__f=@0x55c6a0a95eb8: 0x55c69ef1f368 <TaskA()>) at /usr/include/c++/9/bits/invoke.h:60
#7 0x000055c69ef204be in std::__invoke<void (*)()> (__fn=@0x55c6a0a95eb8: 0x55c69ef1f368 <TaskA()>) at /usr/include/c++/9/bits/invoke.h:95
#8 0x000055c69ef20450 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x55c6a0a95eb8) at /usr/include/c++/9/thread:244
#9 0x000055c69ef2040d in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x55c6a0a95eb8) at /usr/include/c++/9/thread:251
#10 0x000055c69ef203de in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x55c6a0a95eb0) at /usr/include/c++/9/thread:195
#11 0x00007f63c7222df4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007f63c7336609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f63c705e353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

可以看到 #5 0x000055c69ef1f3f8 in TaskA () at thread.cpp:17

使用 thread 2 查看线程2信息,并使用 where 查看堆栈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f63c65e9700 (LWP 390343))]
#0 __lll_lock_wait (futex=futex@entry=0x55c69ef24160 <mtx1>, private=0) at lowlevellock.c:52
52 in lowlevellock.c
(gdb) where
#0 __lll_lock_wait (futex=futex@entry=0x55c69ef24160 <mtx1>, private=0) at lowlevellock.c:52
#1 0x00007f63c73390a3 in __GI___pthread_mutex_lock (mutex=0x55c69ef24160 <mtx1>) at ../nptl/pthread_mutex_lock.c:80
#2 0x000055c69ef1f74f in __gthread_mutex_lock (__mutex=0x55c69ef24160 <mtx1>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3 0x000055c69ef1f8a4 in std::mutex::lock (this=0x55c69ef24160 <mtx1>) at /usr/include/c++/9/bits/std_mutex.h:100
#4 0x000055c69ef1f940 in std::lock_guard<std::mutex>::lock_guard (this=0x7f63c65e8df0, __m=...) at /usr/include/c++/9/bits/std_mutex.h:159
#5 0x000055c69ef1f541 in TaskB () at thread.cpp:31
#6 0x000055c69ef20526 in std::__invoke_impl<void, void (*)()> (__f=@0x55c6a0a96008: 0x55c69ef1f4b1 <TaskB()>) at /usr/include/c++/9/bits/invoke.h:60
#7 0x000055c69ef204be in std::__invoke<void (*)()> (__fn=@0x55c6a0a96008: 0x55c69ef1f4b1 <TaskB()>) at /usr/include/c++/9/bits/invoke.h:95
#8 0x000055c69ef20450 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x55c6a0a96008) at /usr/include/c++/9/thread:244
#9 0x000055c69ef2040d in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x55c6a0a96008) at /usr/include/c++/9/thread:251
#10 0x000055c69ef203de in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x55c6a0a96000) at /usr/include/c++/9/thread:195
#11 0x00007f63c7222df4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007f63c7336609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f63c705e353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

看到 #5 0x000055c69ef1f541 in TaskB () at thread.cpp:31

使用 bt 查看堆栈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(gdb) bt
#0 __lll_lock_wait (futex=futex@entry=0x55c69ef24160 <mtx1>, private=0) at lowlevellock.c:52
#1 0x00007f63c73390a3 in __GI___pthread_mutex_lock (mutex=0x55c69ef24160 <mtx1>) at ../nptl/pthread_mutex_lock.c:80
#2 0x000055c69ef1f74f in __gthread_mutex_lock (__mutex=0x55c69ef24160 <mtx1>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3 0x000055c69ef1f8a4 in std::mutex::lock (this=0x55c69ef24160 <mtx1>) at /usr/include/c++/9/bits/std_mutex.h:100
#4 0x000055c69ef1f940 in std::lock_guard<std::mutex>::lock_guard (this=0x7f63c65e8df0, __m=...) at /usr/include/c++/9/bits/std_mutex.h:159
#5 0x000055c69ef1f541 in TaskB () at thread.cpp:31
#6 0x000055c69ef20526 in std::__invoke_impl<void, void (*)()> (__f=@0x55c6a0a96008: 0x55c69ef1f4b1 <TaskB()>) at /usr/include/c++/9/bits/invoke.h:60
#7 0x000055c69ef204be in std::__invoke<void (*)()> (__fn=@0x55c6a0a96008: 0x55c69ef1f4b1 <TaskB()>) at /usr/include/c++/9/bits/invoke.h:95
#8 0x000055c69ef20450 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x55c6a0a96008) at /usr/include/c++/9/thread:244
#9 0x000055c69ef2040d in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x55c6a0a96008) at /usr/include/c++/9/thread:251
#10 0x000055c69ef203de in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x55c6a0a96000) at /usr/include/c++/9/thread:195
#11 0x00007f63c7222df4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007f63c7336609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f63c705e353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)