Android 稳定性问题

死机问题

Posted by LXG on December 16, 2022

全志-量产系统稳定性问题_排查指南.pdf

系统死机类

系统死机主要分为系统 crash 类,系统 block 类和未知异常三种。除了软件以外,这个问题也可能由硬件引起。

常见问题有 dram 物料更换、电源方案修改、sys_config 修改、dts 修改、menuconfig 修改。

系统 CRASH 类

主要分为随机性 crash 和固定位置 crash。前者一般由硬件引起,后者则一般为软件问题

内核打印报错 oops/panic, 包含如下关键字


Internal error: Oops: 817 [#1] PREEMPT SMP ARM
end Kernel panic - not syncing: Fatal exception

问题分析:

当 linux kernel 遇到无法处理的异常的时候, 就会报错 oops、panic。内核 oops、panic 可能 是下面几种原因:

硬件原因:

  • ddr 异常
  • cpu 异常
  • 供电异常

软件问题

  • 内核代码逻辑 bug
  • 内存异常篡改

android 系统卡死,内核控制台无法响应,内核打印无任何异常信息

系统Block类

内核死锁

关键词:softlockup、hardlockup、hang、blocked

有如下现象之一就可以怀疑是系统死锁问题:

  • 使用串口,按键,屏幕等外设都没有反应
  • 某应用进程阻塞
  • watchdog 未喂狗
  • 内核 lockup 类报错

kernel: BUG: softlockup - CPU#11 stuck for 4278190091s! [qmgr:5492]

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1

INFO: task Appxxx:13189 blocked for more than 120 seconds

INFO: rcu_preempt detected stalls on CPUs/tasks:
1-...: (25112 GPs behind) idle=9da/0/0 softirq=84384/84385 fqs=0
(detected by 0, t=25211 jiffies, g=75984, c=75983, q=188)

系统异常卡死

系统运行过程中界面卡死无法操作且串口、adb 等调试工具无法使用,从 log 中没有有用的错误信息,此类问题归结为系统异常卡死

此类问题比较复杂,可能是:

  • 供电异常
  • 系统 crash
  • 系统 block

休眠唤醒问题

全志客户服务平台【FAQ911】 Android Q R818+AXP707平台使用s_uart后无法休眠唤醒

需要执行以下命令保证log持续输出


# 保证休眠前打印输出完整
echo N > /sys/module/printk/parameters/console_suspend

# 开启内核调试信息输出
echo 8 > /proc/sys/kernel/printk

# 打印休眠前函数调用
echo 1 >  /sys/power/pm_print_times

# 关闭并行休眠,如果出错,可以看到卡在某个地方
echo 0 > /sys/power/pm_async

# 该节点标记是否开启内核早期日志,在内核启动早期先初始化控制台,输出内核启动早期日志信息
echo Y > /sys/module/kernel/parameters/initcall_debug

例子

问题现象:设备休眠时内核崩溃重启 问题原因:uboot 和 kernel 显示驱动初始化参数不一致导致


[  129.638286] Unable to handle kernel paging request at virtual address ffffff800961a000
[  129.638289] pgd = ffffffc00dd2b000
[  129.638298] [ffffff800961a000] *pgd=0000000051967003, *pud=0000000051967003, *pmd=0000000000000000
[  129.638302] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[  129.638318] Modules linked in: gt9xxnew_ts 8723ds pvrsrvkm(O) uvcvideo videobuf2_vmalloc vin_v4l2 gc2053_mipi vin_io videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_core
[  129.638325] CPU: 1 PID: 16 Comm: kworker/1:0 Tainted: G           O    4.9.170 #6
[  129.638326] Hardware name: sun50iw10 (DT)
[  129.638342] Workqueue: events resume_work_0
[  129.638344] task: ffffffc03e1f9d00 task.stack: ffffffc03e204000
[  129.638350] PC is at dsi_dcs_wr+0x164/0x1d8
[  129.638353] LR is at dsi_dcs_wr+0x140/0x1d8
[  129.638356] pc : [<ffffff80085223c4>] lr : [<ffffff80085223a0>] pstate: 20400045
[  129.638357] sp : ffffffc03e207c20
[  129.638360] x29: ffffffc03e207c20 x28: 0000000062326d6a
[  129.638363] x27: ffffff800961a001 x26: 000000000000006b
[  129.638366] x25: ffffff80095c3000 x24: ffffff80093bfb98
[  129.638369] x23: ffffff8009618000 x22: 0000000062326d69
[  129.638372] x21: 0000000000000000 x20: ffffff8009618000
[  129.638375] x19: 0000000000000069 x18: 0000000000000001
[  129.638378] x17: 0000000000000001 x16: 0000000000000001
[  129.638380] x15: 0000000000000000 x14: 0000000000000000
[  129.638383] x13: 0000000000000000 x12: 0000000000000000
[  129.638386] x11: 0000000000000000 x10: 0000000000000001
[  129.638389] x9 : 0000000000000001 x8 : 0000000000000000
[  129.638391] x7 : 0000000000000001 x6 : 0000000000000001
[  129.638394] x5 : 0000000000000001 x4 : 0000000000000000
[  129.638397] x3 : 0000000000000001 x2 : 0000000000001cfc
[  129.638400] x1 : 0000000000000000 x0 : 0000000000000000
[  129.638402]
[  129.638402] SP: 0xffffffc03e207ba0:
[  129.638412] 7ba0  62326d69 00000000 09618000 ffffff80 093bfb98 ffffff80 095c3000 ffffff80
[  129.638422] 7bc0  0000006b 00000000 0961a001 ffffff80 62326d6a 00000000 3e207c20 ffffffc0
[  129.638431] 7be0  085223a0 ffffff80 3e207c20 ffffffc0 085223c4 ffffff80 20400045 00000000
[  129.638441] 7c00  3e207c20 ffffffc0 085223a0 ffffff80 ffffffff ffffffff 095c3a88 ffffff80
[  129.638450] 7c20  3e207c80 ffffffc0 08508924 ffffff80 093bfbe0 ffffff80 00000000 00000000
[  129.638459] 7c40  0934d000 ffffff80 3d63c000 ffffffc0 08b7ff40 ffffff80 08dea428 ffffff80
[  129.638468] 7c60  00000000 00000000 3e0c5330 ffffffc0 3f70b390 ffffffc0 00000000 00000000
[  129.638478] 7c80  3e207ce0 ffffffc0 084ec8a4 ffffff80 3cc74018 ffffffc0 00000001 00000000
[  129.638487]
[  129.638487] X29: 0xffffffc03e207ba0:
[  129.638496] 7ba0  62326d69 00000000 09618000 ffffff80 093bfb98 ffffff80 095c3000 ffffff80
[  129.638505] 7bc0  0000006b 00000000 0961a001 ffffff80 62326d6a 00000000 3e207c20 ffffffc0
[  129.638515] 7be0  085223a0 ffffff80 3e207c20 ffffffc0 085223c4 ffffff80 20400045 00000000
[  129.638524] 7c00  3e207c20 ffffffc0 085223a0 ffffff80 ffffffff ffffffff 095c3a88 ffffff80
[  129.638533] 7c20  3e207c80 ffffffc0 08508924 ffffff80 093bfbe0 ffffff80 00000000 00000000
[  129.638543] 7c40  0934d000 ffffff80 3d63c000 ffffffc0 08b7ff40 ffffff80 08dea428 ffffff80
[  129.638552] 7c60  00000000 00000000 3e0c5330 ffffffc0 3f70b390 ffffffc0 00000000 00000000
[  129.638561] 7c80  3e207ce0 ffffffc0 084ec8a4 ffffff80 3cc74018 ffffffc0 00000001 00000000
[  129.638562]
[  129.638564] Process kworker/1:0 (pid: 16, stack limit = 0xffffffc03e204000)
[  129.638567] Stack: (0xffffffc03e207c20 to 0xffffffc03e208000)
[  129.638571] 7c20: ffffffc03e207c80 ffffff8008508924 ffffff80093bfbe0 0000000000000000
[  129.638575] 7c40: ffffff800934d000 ffffffc03d63c000 ffffff8008b7ff40 ffffff8008dea428
[  129.638578] 7c60: 0000000000000000 ffffffc03e0c5330 ffffffc03f70b390 0000000000000000
[  129.638582] 7c80: ffffffc03e207ce0 ffffff80084ec8a4 ffffffc03cc74018 0000000000000001
[  129.638586] 7ca0: ffffffc03cc74000 00000000000409aa ffffffc03e207ce0 ffffff80084ec934
[  129.638589] 7cc0: ffffffc03cc74008 0000000000000000 ffffffc03e207ce0 00000000000409aa
[  129.638593] 7ce0: ffffffc03e207d20 ffffff80084ef134 ffffffc03cc74000 ffffffc03d63c000
[  129.638597] 7d00: ffffffc03cc75000 0000000000000000 ffffffc03cc759e8 ffffffc03cc75470
[  129.638601] 7d20: ffffffc03e207d60 ffffff80084d7bb0 ffffff80095b9520 ffffffc03e0c5300
[  129.638604] 7d40: ffffffc03f70f300 ffffffc03f70b300 0000000000000000 ffffff80094fabe0
[  129.638608] 7d60: ffffffc03e207d70 ffffff80080c9828 ffffffc03e207dc0 ffffff80080c9cc4
[  129.638612] 7d80: ffffffc03e0c5300 ffffffc03f70b300 ffffffc03f70b300 ffffffc03f70b320
[  129.638616] 7da0: ffffff8009346000 ffffff80094fabe0 ffffff8009368408 00000000000409aa
[  129.638619] 7dc0: ffffffc03e207e20 ffffff80080d06cc ffffffc03e0c4700 ffffff8009562910
[  129.638623] 7de0: ffffff8008da6c20 ffffffc03e1f9d00 ffffffc03e0c5300 ffffff80080c9b88
[  129.638626] 7e00: 0000000000000000 0000000000000000 0000000000000000 ffffffc03e1f9d00
[  129.638630] 7e20: 0000000000000000 ffffff80080834f0 ffffff80080d05e0 ffffffc03e0c4700
[  129.638633] 7e40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638636] 7e60: 0000000000000000 ffffffc03e0c4700 ffffffc03e0c5300 0000000000000000
[  129.638640] 7e80: ffffffc000000000 ffffffc03e207e88 ffffffc03e207e88 0000000000000000
[  129.638644] 7ea0: 0000000000000000 ffffffc03e207ea8 ffffffc03e207ea8 00000000000409aa
[  129.638647] 7ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638650] 7ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638653] 7f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638656] 7f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638659] 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638662] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638666] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638669] 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638672] 7fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
[  129.638675] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  129.638677] Call trace:
[  129.638680] Exception stack(0xffffffc03e207a30 to 0xffffffc03e207b60)
[  129.638682] 7a20:                                   0000000000000069 0000007fffffffff
[  129.638686] 7a40: 0000000041560000 ffffff80085223c4 0000000020400045 ffffffc03e207ac0
[  129.638689] 7a60: ffffffc03e207ab0 ffffff800812b75c ffffffc0ffffffff 00000000000409aa
[  129.638693] 7a80: ffffffc03e207aa0 ffffff8008b21de8 ffffffc03f704e00 0000000000000040
[  129.638697] 7aa0: ffffffc03e207ac0 ffffff800812b75c 00000000000002b7 0000000000000960
[  129.638700] 7ac0: 00000000c142d121 0000000000000001 0000000000000001 0000000000000001
[  129.638703] 7ae0: 0000000000000001 00000000000409aa 0000000000000000 0000000000000000
[  129.638706] 7b00: 0000000000001cfc 0000000000000001 0000000000000000 0000000000000001
[  129.638710] 7b20: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
[  129.638713] 7b40: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
[  129.638717] [<ffffff80085223c4>] dsi_dcs_wr+0x164/0x1d8
[  129.638721] [<ffffff8008508924>] LCD_panel_init+0xa4/0xf8
[  129.638726] [<ffffff80084ec8a4>] disp_lcd_panel_open+0x8c/0x1b0
[  129.638729] [<ffffff80084ef134>] disp_lcd_fake_enable+0x13c/0x280
[  129.638732] [<ffffff80084d7bb0>] resume_work_0+0x40/0x48
[  129.638737] [<ffffff80080c9828>] process_one_work+0x150/0x4b0
[  129.638741] [<ffffff80080c9cc4>] worker_thread+0x13c/0x4c8
[  129.638745] [<ffffff80080d06cc>] kthread+0xec/0x100
[  129.638750] [<ffffff80080834f0>] ret_from_fork+0x10/0x20
[  129.638755] Code: 9100077b 53003c42 38606b00 6b0202df (381ff360)
[  129.638758] ---[ end trace 01b5a9f2bd9c98f5 ]---
[  129.638761] Kernel panic - not syncing: Fatal exception
[  129.638763] SMP: stopping secondary CPUs
[  129.641937] Kernel Offset: disabled
[  129.641943] Memory Limit: none
[  131.633854] ---[ end Kernel panic - not syncing: Fatal exception
[  131.640554] sunxi dump enabled
[  131.643955] dump regs done
[  131.647012] flush cache done
[  131.650235] crashdump enter
NOTICE:  sunxi_usb_dev_register
NOTICE:  sunxi_usb_main_loop
weak:otg_phy_config
                   NOTICE:  usb init ok
[38]HELLO! BOOT0 is starting!

pstore 缓存内核崩溃日志

【FAQ979】 Tina使用ramoops存放崩溃日志

FAQ979

全志轻量级日志永久转存方案依赖于内核原生的 pstore 文件系统,当内核奔溃时,自动把日志转存到flash的pstore分区中,并在开机后以文件形式呈现到用户空间。 对于某些客户,由于产品分区的调整,把存储在mtd的pstore分区关闭了。后期无法ota分区表,但是又需要抓取kernel panic的日志,那可以通过pstore把日志转存到ram中, 但存在一个问题:这种做法对于某些重启会掉电的产品不适用,只对重启不掉电产品形态适用。

执行以下命令,设置kernel panic一秒后重启,以及触发kernel panic:

echo 1 > /proc/sys/kernel/panic

echo c > /proc/sysrq-trigger

查看日志 cat sys/fs/pstore/console-ramoops-0