全志-量产系统稳定性问题_排查指南.pdf
系统死机类
系统死机主要分为系统 crash 类,系统 block 类和未知异常三种。除了软件以外,这个问题也可能由硬件引起。
常见问题有 dram 物料更换、电源方案修改、sys_config 修改、dts 修改、menuconfig 修改。
系统 CRASH 类
主要分为随机性 crash 和固定位置 crash。前者一般由硬件引起,后者则一般为软件问题
内核打印报错 oops/panic, 包含如下关键字
Internal error: Oops: 817 [#1] PREEMPT SMP ARM
end Kernel panic - not syncing: Fatal exception
问题分析:
当 linux kernel 遇到无法处理的异常的时候, 就会报错 oops、panic。内核 oops、panic 可能 是下面几种原因:
硬件原因:
- ddr 异常
- cpu 异常
- 供电异常
软件问题
- 内核代码逻辑 bug
- 内存异常篡改
android 系统卡死,内核控制台无法响应,内核打印无任何异常信息
系统Block类
内核死锁
关键词:softlockup、hardlockup、hang、blocked
有如下现象之一就可以怀疑是系统死锁问题:
- 使用串口,按键,屏幕等外设都没有反应
- 某应用进程阻塞
- watchdog 未喂狗
- 内核 lockup 类报错
kernel: BUG: softlockup - CPU#11 stuck for 4278190091s! [qmgr:5492]
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
INFO: task Appxxx:13189 blocked for more than 120 seconds
INFO: rcu_preempt detected stalls on CPUs/tasks:
1-...: (25112 GPs behind) idle=9da/0/0 softirq=84384/84385 fqs=0
(detected by 0, t=25211 jiffies, g=75984, c=75983, q=188)
系统异常卡死
系统运行过程中界面卡死无法操作且串口、adb 等调试工具无法使用,从 log 中没有有用的错误信息,此类问题归结为系统异常卡死
此类问题比较复杂,可能是:
- 供电异常
- 系统 crash
- 系统 block
休眠唤醒问题
全志客户服务平台【FAQ911】 Android Q R818+AXP707平台使用s_uart后无法休眠唤醒
需要执行以下命令保证log持续输出
# 保证休眠前打印输出完整
echo N > /sys/module/printk/parameters/console_suspend
# 开启内核调试信息输出
echo 8 > /proc/sys/kernel/printk
# 打印休眠前函数调用
echo 1 > /sys/power/pm_print_times
# 关闭并行休眠,如果出错,可以看到卡在某个地方
echo 0 > /sys/power/pm_async
# 该节点标记是否开启内核早期日志,在内核启动早期先初始化控制台,输出内核启动早期日志信息
echo Y > /sys/module/kernel/parameters/initcall_debug
例子
问题现象:设备休眠时内核崩溃重启 问题原因:uboot 和 kernel 显示驱动初始化参数不一致导致
[ 129.638286] Unable to handle kernel paging request at virtual address ffffff800961a000
[ 129.638289] pgd = ffffffc00dd2b000
[ 129.638298] [ffffff800961a000] *pgd=0000000051967003, *pud=0000000051967003, *pmd=0000000000000000
[ 129.638302] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[ 129.638318] Modules linked in: gt9xxnew_ts 8723ds pvrsrvkm(O) uvcvideo videobuf2_vmalloc vin_v4l2 gc2053_mipi vin_io videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_core
[ 129.638325] CPU: 1 PID: 16 Comm: kworker/1:0 Tainted: G O 4.9.170 #6
[ 129.638326] Hardware name: sun50iw10 (DT)
[ 129.638342] Workqueue: events resume_work_0
[ 129.638344] task: ffffffc03e1f9d00 task.stack: ffffffc03e204000
[ 129.638350] PC is at dsi_dcs_wr+0x164/0x1d8
[ 129.638353] LR is at dsi_dcs_wr+0x140/0x1d8
[ 129.638356] pc : [<ffffff80085223c4>] lr : [<ffffff80085223a0>] pstate: 20400045
[ 129.638357] sp : ffffffc03e207c20
[ 129.638360] x29: ffffffc03e207c20 x28: 0000000062326d6a
[ 129.638363] x27: ffffff800961a001 x26: 000000000000006b
[ 129.638366] x25: ffffff80095c3000 x24: ffffff80093bfb98
[ 129.638369] x23: ffffff8009618000 x22: 0000000062326d69
[ 129.638372] x21: 0000000000000000 x20: ffffff8009618000
[ 129.638375] x19: 0000000000000069 x18: 0000000000000001
[ 129.638378] x17: 0000000000000001 x16: 0000000000000001
[ 129.638380] x15: 0000000000000000 x14: 0000000000000000
[ 129.638383] x13: 0000000000000000 x12: 0000000000000000
[ 129.638386] x11: 0000000000000000 x10: 0000000000000001
[ 129.638389] x9 : 0000000000000001 x8 : 0000000000000000
[ 129.638391] x7 : 0000000000000001 x6 : 0000000000000001
[ 129.638394] x5 : 0000000000000001 x4 : 0000000000000000
[ 129.638397] x3 : 0000000000000001 x2 : 0000000000001cfc
[ 129.638400] x1 : 0000000000000000 x0 : 0000000000000000
[ 129.638402]
[ 129.638402] SP: 0xffffffc03e207ba0:
[ 129.638412] 7ba0 62326d69 00000000 09618000 ffffff80 093bfb98 ffffff80 095c3000 ffffff80
[ 129.638422] 7bc0 0000006b 00000000 0961a001 ffffff80 62326d6a 00000000 3e207c20 ffffffc0
[ 129.638431] 7be0 085223a0 ffffff80 3e207c20 ffffffc0 085223c4 ffffff80 20400045 00000000
[ 129.638441] 7c00 3e207c20 ffffffc0 085223a0 ffffff80 ffffffff ffffffff 095c3a88 ffffff80
[ 129.638450] 7c20 3e207c80 ffffffc0 08508924 ffffff80 093bfbe0 ffffff80 00000000 00000000
[ 129.638459] 7c40 0934d000 ffffff80 3d63c000 ffffffc0 08b7ff40 ffffff80 08dea428 ffffff80
[ 129.638468] 7c60 00000000 00000000 3e0c5330 ffffffc0 3f70b390 ffffffc0 00000000 00000000
[ 129.638478] 7c80 3e207ce0 ffffffc0 084ec8a4 ffffff80 3cc74018 ffffffc0 00000001 00000000
[ 129.638487]
[ 129.638487] X29: 0xffffffc03e207ba0:
[ 129.638496] 7ba0 62326d69 00000000 09618000 ffffff80 093bfb98 ffffff80 095c3000 ffffff80
[ 129.638505] 7bc0 0000006b 00000000 0961a001 ffffff80 62326d6a 00000000 3e207c20 ffffffc0
[ 129.638515] 7be0 085223a0 ffffff80 3e207c20 ffffffc0 085223c4 ffffff80 20400045 00000000
[ 129.638524] 7c00 3e207c20 ffffffc0 085223a0 ffffff80 ffffffff ffffffff 095c3a88 ffffff80
[ 129.638533] 7c20 3e207c80 ffffffc0 08508924 ffffff80 093bfbe0 ffffff80 00000000 00000000
[ 129.638543] 7c40 0934d000 ffffff80 3d63c000 ffffffc0 08b7ff40 ffffff80 08dea428 ffffff80
[ 129.638552] 7c60 00000000 00000000 3e0c5330 ffffffc0 3f70b390 ffffffc0 00000000 00000000
[ 129.638561] 7c80 3e207ce0 ffffffc0 084ec8a4 ffffff80 3cc74018 ffffffc0 00000001 00000000
[ 129.638562]
[ 129.638564] Process kworker/1:0 (pid: 16, stack limit = 0xffffffc03e204000)
[ 129.638567] Stack: (0xffffffc03e207c20 to 0xffffffc03e208000)
[ 129.638571] 7c20: ffffffc03e207c80 ffffff8008508924 ffffff80093bfbe0 0000000000000000
[ 129.638575] 7c40: ffffff800934d000 ffffffc03d63c000 ffffff8008b7ff40 ffffff8008dea428
[ 129.638578] 7c60: 0000000000000000 ffffffc03e0c5330 ffffffc03f70b390 0000000000000000
[ 129.638582] 7c80: ffffffc03e207ce0 ffffff80084ec8a4 ffffffc03cc74018 0000000000000001
[ 129.638586] 7ca0: ffffffc03cc74000 00000000000409aa ffffffc03e207ce0 ffffff80084ec934
[ 129.638589] 7cc0: ffffffc03cc74008 0000000000000000 ffffffc03e207ce0 00000000000409aa
[ 129.638593] 7ce0: ffffffc03e207d20 ffffff80084ef134 ffffffc03cc74000 ffffffc03d63c000
[ 129.638597] 7d00: ffffffc03cc75000 0000000000000000 ffffffc03cc759e8 ffffffc03cc75470
[ 129.638601] 7d20: ffffffc03e207d60 ffffff80084d7bb0 ffffff80095b9520 ffffffc03e0c5300
[ 129.638604] 7d40: ffffffc03f70f300 ffffffc03f70b300 0000000000000000 ffffff80094fabe0
[ 129.638608] 7d60: ffffffc03e207d70 ffffff80080c9828 ffffffc03e207dc0 ffffff80080c9cc4
[ 129.638612] 7d80: ffffffc03e0c5300 ffffffc03f70b300 ffffffc03f70b300 ffffffc03f70b320
[ 129.638616] 7da0: ffffff8009346000 ffffff80094fabe0 ffffff8009368408 00000000000409aa
[ 129.638619] 7dc0: ffffffc03e207e20 ffffff80080d06cc ffffffc03e0c4700 ffffff8009562910
[ 129.638623] 7de0: ffffff8008da6c20 ffffffc03e1f9d00 ffffffc03e0c5300 ffffff80080c9b88
[ 129.638626] 7e00: 0000000000000000 0000000000000000 0000000000000000 ffffffc03e1f9d00
[ 129.638630] 7e20: 0000000000000000 ffffff80080834f0 ffffff80080d05e0 ffffffc03e0c4700
[ 129.638633] 7e40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638636] 7e60: 0000000000000000 ffffffc03e0c4700 ffffffc03e0c5300 0000000000000000
[ 129.638640] 7e80: ffffffc000000000 ffffffc03e207e88 ffffffc03e207e88 0000000000000000
[ 129.638644] 7ea0: 0000000000000000 ffffffc03e207ea8 ffffffc03e207ea8 00000000000409aa
[ 129.638647] 7ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638650] 7ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638653] 7f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638656] 7f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638659] 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638662] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638666] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638669] 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638672] 7fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
[ 129.638675] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 129.638677] Call trace:
[ 129.638680] Exception stack(0xffffffc03e207a30 to 0xffffffc03e207b60)
[ 129.638682] 7a20: 0000000000000069 0000007fffffffff
[ 129.638686] 7a40: 0000000041560000 ffffff80085223c4 0000000020400045 ffffffc03e207ac0
[ 129.638689] 7a60: ffffffc03e207ab0 ffffff800812b75c ffffffc0ffffffff 00000000000409aa
[ 129.638693] 7a80: ffffffc03e207aa0 ffffff8008b21de8 ffffffc03f704e00 0000000000000040
[ 129.638697] 7aa0: ffffffc03e207ac0 ffffff800812b75c 00000000000002b7 0000000000000960
[ 129.638700] 7ac0: 00000000c142d121 0000000000000001 0000000000000001 0000000000000001
[ 129.638703] 7ae0: 0000000000000001 00000000000409aa 0000000000000000 0000000000000000
[ 129.638706] 7b00: 0000000000001cfc 0000000000000001 0000000000000000 0000000000000001
[ 129.638710] 7b20: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
[ 129.638713] 7b40: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
[ 129.638717] [<ffffff80085223c4>] dsi_dcs_wr+0x164/0x1d8
[ 129.638721] [<ffffff8008508924>] LCD_panel_init+0xa4/0xf8
[ 129.638726] [<ffffff80084ec8a4>] disp_lcd_panel_open+0x8c/0x1b0
[ 129.638729] [<ffffff80084ef134>] disp_lcd_fake_enable+0x13c/0x280
[ 129.638732] [<ffffff80084d7bb0>] resume_work_0+0x40/0x48
[ 129.638737] [<ffffff80080c9828>] process_one_work+0x150/0x4b0
[ 129.638741] [<ffffff80080c9cc4>] worker_thread+0x13c/0x4c8
[ 129.638745] [<ffffff80080d06cc>] kthread+0xec/0x100
[ 129.638750] [<ffffff80080834f0>] ret_from_fork+0x10/0x20
[ 129.638755] Code: 9100077b 53003c42 38606b00 6b0202df (381ff360)
[ 129.638758] ---[ end trace 01b5a9f2bd9c98f5 ]---
[ 129.638761] Kernel panic - not syncing: Fatal exception
[ 129.638763] SMP: stopping secondary CPUs
[ 129.641937] Kernel Offset: disabled
[ 129.641943] Memory Limit: none
[ 131.633854] ---[ end Kernel panic - not syncing: Fatal exception
[ 131.640554] sunxi dump enabled
[ 131.643955] dump regs done
[ 131.647012] flush cache done
[ 131.650235] crashdump enter
NOTICE: sunxi_usb_dev_register
NOTICE: sunxi_usb_main_loop
weak:otg_phy_config
NOTICE: usb init ok
[38]HELLO! BOOT0 is starting!
pstore 缓存内核崩溃日志
【FAQ979】 Tina使用ramoops存放崩溃日志
全志轻量级日志永久转存方案依赖于内核原生的 pstore 文件系统,当内核奔溃时,自动把日志转存到flash的pstore分区中,并在开机后以文件形式呈现到用户空间。 对于某些客户,由于产品分区的调整,把存储在mtd的pstore分区关闭了。后期无法ota分区表,但是又需要抓取kernel panic的日志,那可以通过pstore把日志转存到ram中, 但存在一个问题:这种做法对于某些重启会掉电的产品不适用,只对重启不掉电产品形态适用。
执行以下命令,设置kernel panic一秒后重启,以及触发kernel panic:
echo 1 > /proc/sys/kernel/panic
echo c > /proc/sysrq-trigger
查看日志 cat sys/fs/pstore/console-ramoops-0