Linux系统 2023-06-26

深入Linux：理解PID和TID的概念 (linux pid和tid)

Linux操作系统作为一种开源的操作系统，大家可能都很熟悉了。实际上，Linux操作系统在服务器领域的应用广泛，它提供了强大的网络功能和高效的文件系统，它的性能和安全性也备受赞誉。但是作为一名Linux开发者，对于PID和TID的概念的理解是十分必要的。在此，我们将深入Linux，来一探究竟：PID和TID到底是什么？

一、什么是PID和TID？

PID是进程号的缩写，它是一个唯一的数字，用于标识Linux操作系统的进程。每个进程都有一个PID，它由内核动态分配，以便Linux操作系统可以区分不同的进程并在进程终止时清理它所使用的资源。当一个进程被创建时，它会被分配一个新的PID，而当一个进程终止时，它的PID就会被释放，从而被系统重新使用。

TID是线程ID的缩写，它是一个数字，用于标识Linux操作系统的线程。在Linux中，线程被视为一种特殊的进程，每个线程都有一个唯一的TID。因为Linux支持多线程，所以我们需要一种机制来区分不同的线程。每个线程都具有与其关联的进程的PID，并具有其自己的TID，以便操作系统可以在多个线程之间进行调度。

二、PID和TID的关系

每个进程都有一个唯一的PID，但在Linux操作系统中，一个进程可以有多个线程。这些线程共享相同的进程上下文，并使用相同的系统资源。因此，在Linux中，进程和线程之间并没有严格的区别，它们可以被视为同一种实体。所以，我们在使用Linux操作系统时，需要理解PID和TID之间的关系。

在Linux中，每个线程都由一个进程创建。当一个进程创建一个线程时，该线程会继承其父进程的PID，并被分配一个新的TID。这种机制使得Linux在多线程运行时能够更加灵活地对线程进行调度，同时也便于我们在进程和线程之间切换。

三、PID和TID的应用

在Linux的日常开发和维护中，PID和TID是必不可少的概念。不仅是操作系统内核会使用它们，而且许多进程和线程管理工具也会用到它们。

我们可以使用PID和TID来查看进程和线程的状态。通过指定PID或TID，我们可以查看该进程或线程的CPU使用率、内存使用情况以及其他有关它的信息。

我们可以使用PID和TID来控制进程和线程的行为。通过指定PID或TID，我们可以向进程或线程发送信号，并控制它们的行为。例如，我们可以使用kill命令向进程发送SIGTERM信号，以请求它终止运行。或者我们可以使用pthread_kill命令向线程发送自定义信号，以通知它完成某个任务。

我们可以使用PID和TID来进行调试和故障排除。通过指定PID或TID，我们可以使用调试工具对该进程或线程进行调试，以快速诊断和解决问题。例如，在使用gdb进行调试时，我们可以通过指定进程的PID或线程的TID来设置断点和观察变量。

四、

在Linux操作系统中，PID和TID是非常重要的概念。PID用于标识进程，而TID用于标识线程。每个进程都有一个唯一的PID，但一个进程可以有多个线程，它们是由该进程创建的，并共享相同的进程上下文。

在日常开发和维护中，PID和TID是必不可少的。我们可以使用它们来查看进程和线程的状态、控制它们的行为以及进行调试和故障排除。因此，在使用Linux操作系统时，了解和掌握PID和TID的概念是至关重要的。

相关问题拓展阅读：

Linux里面cpu占用太高排查思路是什么？
ulibc怎么实现backtrace
Linux 权能综述

Linux里面cpu占用太高排查思路是什么？

可以通过top命令来缺蠢查看占用cpu的软件看是否掘扮瞎有僵尸进程在占用cpu如果可判空以通过kill杀死无用的进程！

思路就是top查看是什么进程占用高，一般是应胡答配用或者数据库，应用方裤指面可以看看运行吐出日志是否有报错信息，查netstat连接举敬应用端口的会话是不是有异常，数据库进程高，可以使用自带的检查命令后台看是否有执行很久的sql事务，锁等待频繁，报错日志等，找到问题针对性的优化，一步一步解决。

方法一

之一步：使用

top命令，然后按shift+p按照CPU排序

找到占用CPU过高的进程的pid

第二步：使用

top -H -p

找到进程中消耗资源更高的线程的id

第三步：使用

echo ‘obase=16;’悉则 | bc或者printf “%x\n”

将线程id转换为16进制（字母要小写）

bc是linux的计算器命令

第四步：执行

jstack |grep -A 10 ”

查看线程状态信息

方法郑空二

之一步：使用

top命令，然后按shift+p按照CPU排序

找到占用CPU过高的进程

第二步：使用

ps -mp pid -o THREAD,tid,time | sort -rn

获取线程信息，并找到占用CPU高的线程

第三步：使用

echo ‘obase=16;’ | bc或者printf “%x\n”

将需要的线程ID转换为16进制格式

第四步：使喊陆瞎用

jstack pid |grep tid -A 30

ulibc怎么实现backtrace

首先，让我们看一看AndroidLog的格式。下面这段log是以所谓的long格式打印出来的。从前面Logcat的介绍中可以知道，long格式会把时间，标签等作为单独的一行显示。

Start procnet.coollet.infzmreader:umengService_v1 for service

net.coollet.infzmreader/com.umeng.message.

UmengService:pid=21745 uid=10039 gids={50039, 3003, 1015,1028}

Turning on JNI app bug workarounds fortarget SDK version 8…

onCreate()

我们以之一行为例：12-09 是日期，21:39:35.510是时间396是进程号，416是线程号；I代表log优先级，ActivityManager是log标签。

在应用开发中，这些信息的作用可能不是很大。但是在系统开发中，这些都是很重要的辅助信息。开发工程师分析的log很多都是由测试工程师抓取的，所以可能有些log根本就不是当时出错的log。如果出现这种情况，无论你怎么分析都不太可蚂含余能得出正确的结论。如何能更大限度的避免这种情况呢？笔者就要求测试工程师报bug时必须填上bug发生的时间。这样结合log里的时间戳信息就能大致判断是否是发生错误时的log。而且根据测试工程师提供的bug发生时间点，开发工程师可以在长长的log信息中快速的定位错误的位置，缩小分析的范围。

同时我们也要注意，时间信息在log分析中可能被错误的使用。例如：在分析多线程相关的问题时，我们有时需要根据两段不同线程中log语句执行的先后顺序来判断错误发生的原因，但是我们不能以两段log在log文件中出现的先后做为判断的条件，这是因为在小段时间内两个线程输出log的先后是随机的，log打印的先后顺序并不完全等同于执行的顺序。那么我们是否能以log的时间戳来判断呢？同样是不可以，因为这个时间戳实际上是系统打印输出log时的时间，并不是调用log函数时的时间。遇到这种情况唯一的办法是在输出log前，调用系统时间函数获取当时时间，然后再通过log信息打印输出。这样虽然麻烦一点，但是只有这样取得的时间才是可靠的，才能做为我们判断的依据。

另外一种误用log中时间戳的情况是用它来分析程序的性能。一个有多年工作经验的工程师拿着他的性能分析结果给笔者看，但是闷滚笔者对这份和实际情况相差很远的报告表示怀疑，于是询问这位工程师是如何得出结论的。他的回答让笔者很惊讶，他计算所采用的数据就是log信息前面的时间戳。前面我们已经讲过，log前面时间戳和调用log函数的时间并不老兆相同，这是由于系统缓冲log信息引起的，而且这两个时间的时间差并不固定。所以用log信息前附带的时间戳来计算两段log间代码的性能会有比较大的误差。正确的方法还是上面提到的：在程序中获取系统时间然后打印输出，利用我们打印的时间来计算所花费的时间。

了解了时间，我们再谈谈进程Id和线程Id，它们也是分析log时很重要的依据。我们看到的log文件，不同进程的log信息实际上是混杂在一起输出的，这给我们分析log带来了很大的麻烦。有时即使是一个函数内的两条相邻的log，也会出现不同进程的log交替输出的情况，也就是A进程的之一条log后面跟着的是B进程的第二条log，对于这样的组合如果不细心分析，就很容易得出错误的结论。这时一定要仔细看log前面的进程Id，把相同Id的log放到一起看。

不同进程的log有这样的问题，不同的线程输出的log当然也存在着相同的问题。Logcat加上-vthread就能打印出线程Id。但是有一点也要引起注意，就是Android的线程Id和我们平时所讲的Linux线程Id并不完全等同。首先，在Android系统中，C++层使用的Linux获取线程Id的函数gettid()是不能得到线程Id的，调用gettid()实际上返回的是进程Id。作为替代，我们可以调用pthread_self()得到一个唯一的值来标示当前的native线程。Android也提供了一个函数androidGetThreaId()来获取线程Id，这个函数实际上就是在调用pthread_self函数。但是在Java层线程Id又是另外一个值，Java层的线程Id是通过调用Thread的getId方法得到的，这个方法的返回值实际上来自Android在每个进程的java层中维护的一个全局变量，所以这个值和C++层所获得的值并不相同。这也是我们分析log时要注意的问题，如果是Java层线程Id，一般值会比较小，几百左右；如果是C++层的线程，值会比较大。在前里面的log样本中，就能很容易的看出，之一条log是Jave层输出的log，第二条是native层输出的。明白了这些，我们在分析log时就不要看见两段log前面的线程Id不相同就得出是两个不同线程log的简单结论，还要注意Jave层和native层的区别，这样才能防止被误导。

AndroidLog的优先级在打印输出时会被转换成V，I，D，W，E等简单的字符标记。在做系统log分析时，我们很难把一个log文件从头看到尾，都是利用搜索工具来查找出错的标记。比如搜索“E/”来看看有没有指示错误的log。所以如果参与系统开发的每个工程师都能遵守Android定义的优先级含义来输出log，这会让我们繁重的log分析工作变得相对轻松些。

Android比较常见的严重问题有两大类，一是程序发生崩溃；二是产生了ANR。程序崩溃和ANR既可能发生在java层，也可能发生在native层。如果问题发生在java层，出错的原因一般比较容易定位。如果是native层的问题，在很多情况下，解决问题就不是那么的容易了。我们先看一个java层的崩溃例子：

I/ActivityManager( 396): Start proccom.test.crash for activity com.test.crash/.MainActivity:

pid=1760 uid=10065 gids={50065, 1028}

D/AndroidRuntime( 1760): Shutting downVM

W/dalvikvm( 1760): threadid=1: threadexiting with uncaught exception(group=0x40c38930)

E/AndroidRuntime( 1760): FATALEXCEPTION: main

E/AndroidRuntime( 1760):java.lang.RuntimeException: Unable to start activityComponentInfo

{com.test.crash/com.test.crash.MainActivity}:java.lang.NullPointerException

E/AndroidRuntime( 1760): atandroid.app.ActivityThread.performLaunchActivity(ActivityThread.java:2180)

E/AndroidRuntime( 1760): atandroid.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2230)

E/AndroidRuntime( 1760): atandroid.app.ActivityThread.access$600(ActivityThread.java:141)

E/AndroidRuntime( 1760): atandroid.app.ActivityThread$H.handleMessage(ActivityThread.java:1234)

E/AndroidRuntime( 1760): atandroid.os.Handler.dispatchMessage(Handler.java:99)

E/AndroidRuntime( 1760): atandroid.os.Looper.loop(Looper.java:137)

E/AndroidRuntime( 1760): atandroid.app.ActivityThread.main(ActivityThread.java:5050)

E/AndroidRuntime( 1760): atjava.lang.reflect.Method.invokeNative(NativeMethod)

E/AndroidRuntime( 1760): atjava.lang.reflect.Method.invoke(Method.java:511)

E/AndroidRuntime( 1760): atcom.android.internal.os.ZygoteInit$MethodAndArgsCaller.run

(ZygoteInit.java:793)

E/AndroidRuntime( 1760): atcom.android.internal.os.ZygoteInit.main(ZygoteInit.java:560)

E/AndroidRuntime( 1760): atdalvik.system.NativeStart.main(NativeMethod)

E/AndroidRuntime( 1760): Caused by:java.lang.NullPointerException

E/AndroidRuntime( 1760): atcom.test.crash.MainActivity.setViewText(MainActivity.java:29)

E/AndroidRuntime( 1760): atcom.test.crash.MainActivity.onCreate(MainActivity.java:17)

E/AndroidRuntime( 1760): atandroid.app.Activity.performCreate(Activity.java:5104)

E/AndroidRuntime( 1760): atandroid.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1080)

E/AndroidRuntime( 1760): atandroid.app.ActivityThread.performLaunchActivity(ActivityThread.java:2144)

E/AndroidRuntime( 1760): … 11more

I/Process ( 1760): Sending signal.PID: 1760 SIG: 9

W/ActivityManager( 396): Force finishing activitycom.test.crash/.MainActivity

Jave层的代码发生crash问题时，系统往往会打印出很详细的出错信息。比如上面这个例子，不但给出了出错的原因，还有出错的文件和行数。根据这些信息，我们会很容易的定位问题所在。native层的crash虽然也有栈log信息输出，但是就不那么容易看懂了。下面我们再看一个native层crash的例子：

F/libc ( 2102): Fatal signal 11 (SIGSEGV) at 0x(code=1), thread2102 (testapp)

D/dalvikvm(26630):GC_FOR_ALLOC freed 604K, 11% free 11980K/13368K, paused 36ms, total36ms

I/dalvikvm-heap(26630):Grow heap (frag case) to 11.831MB forbyteallocation

D/dalvikvm(26630):GC_FOR_ALLOC freed 1K, 11% free 12023K/13472K, paused 34ms, total34ms

I/DEBUG ( 127):***************************

I/DEBUG ( 127):Build fingerprint:

‘Android/full_maguro/maguro:4.2.2/JDQ39/eng.liuchao..202355:userdebug/test-keys’

I/DEBUG ( 127):Revision: ‘9’

I/DEBUG ( 127):pid: 2102, tid: 2102, name: testapp >>>./testapp >>./testapp

从这一行我们可以知道crash进程的pid和tid，前文我们已经提到过，Android调用gettid函数得到的实际是进程Id号，所以这里的pid和tid相同。知道进程号后我们可以往前翻翻log，看看该进程最后一次打印的log是什么，这样能缩小一点范围。

接下来内容是进程名和启动参数。再接下来的一行比较重要了，它告诉了我们从系统角度看，出错的原因：

signal 11 (SIGSEGV), code 1(SEGV_MAPERR), fault addr

signal11是Linux定义的信号之一，含义是Invalidmemory reference，无效的内存引用。加上后面的“faultaddr”我们基本可以判定这是一个空指针导致的crash。当然这是笔者为了讲解而特地制造的一个Crash的例子，比较容易判断，大部分实际的例子可能就没有那么容易了。

再接下来的log打印出了cpu的所有寄存器的信息和堆栈的信息，这里面最重要的是从堆栈中得到的backtrace信息：

I/DEBUG ( 127):backtrace:

I/DEBUG ( 127): #00 pce /system/bin/testapp

I/DEBUG ( 127): #01 pcb /system/bin/testapp

I/DEBUG ( 127): #02 pcf /system/lib/libc.so (__libc_init+38)

I/DEBUG ( 127): #03 pc/system/bin/testapp

因为实际的运行系统里没有符号信息，所以打印出的log里看不出文件名和行数。这就需要我们借助编译时留下的符号信息表来翻译了。Android提供了一个工具可以来做这种翻译工作：arm-eabi-addr2line，位于prebuilts/gcc/linux-x86/arm/arm-eabi-4.6/bin目录下。用法很简单：

#./arm-eabi-addr2line -f -eout/target/product/hammerhead/symbols/system/bin/testapp0xe

参数-f表示打印函数名；参数-e表示带符号表的模块路径；最后是要转换的地址。这条命令在笔者的编译环境中得到的结果是：

memcpy /home/rd/compile/android-4.4_r1.2/bionic/libc/include/string.h:108

剩余三个地址翻译如下：

main /home/rd/compile/android-4.4_r1.2/packages/apps/testapp/app_main.cpp:38

out_vformat /home/rd/compile/android-4.4_r1.2/bionic/libc/bionic/libc_logging.cpp:361

_start libgcc2.c:0

利用这些信息我们很快就能定位问题了。不过这样手动一条一条的翻译比较麻烦，笔者使用的是从网上找到的一个脚本，可以一次翻译所有的行，有需要的读者可以在网上找一找。

Linux 权能综述

为了执行权限检查，传统的 UNIX 实现区分两种类型的进程：特权进程（其有效用户 ID 为0，称为超级用户或 root），和非特权用户（其有效 UID 非0）。特权进程绕过所有的内核权限检查，而非特权进程受基于进程的认证信息（通常是：有效 UID，世斗有效 GID，仿弯和补充组列表）的完整权限检查的支配。

自内核 2.2 版本开始，Linux 将传统上与超级用户关联的特权分为几个单元，称为 capabilities （权能），它们可以被独立的搜大磨启用或禁用。权能是每个线程的属性。

下面的列表展示了 Linux 上实现的权能，以及每种权能允许的操作或行为：

权能的完整实现需要：

在内核 2.6.24 之前，只有前两个要求能够满足；自内核 2.6.24 开始，所有三个要求都能满足。

每个线程具有三个包含零个或多个上面的权能的权能：

A child created via fork(2) inherits copies of its parent’s capability sets. See below for a discussion of the treatment of capabilities during execve(2).

Using capset(2), a thread may manipulate its own capability sets (see below).

Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the numerical value of the highest capability supported by the running kernel; this can be used to determine the highest bit that may be set in a capability set.

Since kernel 2.6.24, the kernel supports associating capability sets with an executable file using setcap(8). The file capability sets are stored in an extended attribute (see setxattr(2)) named security.capability. Writing to this extended attribute requires the CAP_SETFCAP capability. The file capability sets, in conjunction with the capability sets of the thread, determine the capabilities of a thread after an execve(2).

The three file capability sets are:

During an execve(2), the kernel calculates the new capabilities of the process using the following algorithm:

其中：

A privileged file is one that has capabilities or has the set-user-ID or set-group-ID bit set.

In order to provide an all-powerful root using capability sets, during an execve(2):

The upshot of the above rules, combined with the capabilities transformations described above, is that when a process execve(2)s a set-user-ID-root program, or when a process with an effective UID of 0 execve(2)s a program, it gains all capabilities in its permitted and effective capability sets, except those masked out by the capability bounding set. This provides semantics that are the same as those provided by traditional UNIX systems.

The capability bounding set is a security mechani that can be used to limit the capabilities that can be gained during an execve(2). The bounding set is used in the following ways:

Note that the bounding set masks the file permitted capabilities, but not the inherited capabilities. If a thread maintains a capability in its inherited set that is not in its bounding set, then it can still gain that capability in its permitted set by executing a file that has the capability in its inherited set.

Depending on the kernel version, the capability bounding set is either a system-wide attribute, or a per-process attribute.

In kernels before 2.6.25, the capability bounding set is a system-wide attribute that affects all threads on the system. The bounding set is accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this bit mask parameter is expressed as a signed decimal number in /proc/sys/kernel/capbound.)

Only the init process may set capabilities in the capability bounding set; other than that, the superuser (more precisely: programs with the CAP_SYS_MODULE capability) may only clear capabilities from this set.

On a standard system the capability bounding set always masks out the CAP_SETPCAP capability. To remove this restriction (dangerous!), modify the definition of CAP_INIT_EFF_SET in include/linux/capability.h and rebuild the kernel.

The system-wide capability bounding set feature was added to Linux starting with kernel version 2.2.11.

From Linux 2.6.25, the capability bounding set is a per-thread attribute. (There is no longer a systemwide capability bounding set.)

The bounding set is inherited at fork(2) from the thread’s parent, and is preserved across an execve(2).

A thread may remove capabilities from its capability bounding set using the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP capability. Once a capability has been dropped from the bounding set, it cannot be restored to that set. A thread can determine if a capability is in its bounding set using the prctl(2) PR_CAPBSET_READ operation.

Removing capabilities from the bounding set is supported only if file capabilities are compiled into the kernel. In kernels before Linux 2.6.33, file capabilities were an optional feature configurable via the CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the configuration option has been removed and file capabilities are always part of the kernel. When file capabilities are compiled into the kernel, the init process (the ancestor of all processes) begins with a full bounding set. If file capabilities are not compiled into the kernel, then init begins with a full bounding set minus CAP_SETPCAP, because this capability has a different meaning when there are no file capabilities.

Removing a capability from the bounding set does not remove it from the thread’s inherited set. However it does prevent the capability from being added back into the thread’s inherited set in the future.

To preserve the traditional semantics for transitions between 0 and nonzero user IDs, the kernel makes the following changes to a thread’s capability sets on changes to the thread’s real, effective, saved set, and filesystem user IDs (using setuid(2), setresuid(2), or similar):

If a thread that has a 0 value for one or more of its user IDs wants to prevent its permitted capability set being cleared when it resets all of its user IDs to nonzero values, it can do so using the prctl(2) PR_SET_KEEPCAPS operation or the SECBIT_KEEP_CAPS securebits flag described below.

A thread can retrieve and change its capability sets using the capget(2) and capset(2) system calls. However, the use of cap_get_proc(3) and cap_set_proc(3), both provided in the libcap package, is preferred for this purpose. The following rules govern changes to the thread capability sets:

Starting with kernel 2.6.26, and with a kernel in which file capabilities are enabled, Linux implements a set of per-thread securebits flags that can be used to disable special handling of capabilities for UID 0 (root). These flags are as follows:

Each of the above “base” flags has a companion “locked” flag. Setting any of the “locked” flags is irreversible, and has the effect of preventing further changes to the corresponding “base” flag. The locked flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED, SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE.

The securebits flags can be modified and retrieved using the prctl(2) PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP capability is required to modify the flags.

The securebits flags are inherited by child processes. During an execve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS which is always cleared.

An application can use the following call to lock itself, and all of its descendants, into an environment where the only way of gaining capabilities is by executing a program with associated file capabilities:

For a discussion of the interaction of capabilities and user namespaces, see user_namespaces(7).

No standards govern capabilities, but the Linux capability implementation is based on the withdrawn POSIX.1e draft standard; see ⟨

⟩.

From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional kernel component, and can be enabled/disabled via the CONFIG_SECURITY_CAPABILITIES kernel configuration option.

The /proc/PID/task/TID/status file can be used to view the capability sets of a thread. The /proc/PID/status file shows the capability sets of a process’s main thread. Before Linux 3.8, nonexistent capabilities were shown as being enabled (1) in these sets. Since Linux 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as disabled (0).

The libcap package provides a suite of routines for setting and getting capabilities that is more comfortable and less likely to change than the interface provided by capset(2) and capget(2). This package also provides the setcap(8) and getcap(8) programs. It can be found at ⟨

⟩.

Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file capabilities are not enabled, a thread with the CAP_SETPCAP capability can manipulate the capabilities of threads other than itself. However, this is only theoretically possible, since no thread ever has CAP_SETPCAP in either of these cases:

capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3), cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3), cap_init(3), capgetp(3), capsetp(3), libcap(3), credentials(7), user_namespaces(7), pthreads(7), getcap(8), setcap(8)

include/linux/capability.h in the Linux kernel source tree

This page is part of release 4.04 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at

关于linux pid和tid的介绍到此就结束了，不知道你从中找到你需要的信息了吗？如果你还想了解更多这方面的信息，记得收藏关注本站。

数据运维技术 » 深入Linux：理解PID和TID的概念 (linux pid和tid)

分享到：

Linux里面cpu占用太高排查思路是什么？

ulibc怎么实现backtrace

Linux 权能综述

相关推荐