Android 稳定性问题

死机-卡动画-卡logo

Posted by LXG on December 26, 2024

A133 安卓13 偶现无法开机问题(卡动画)

异常日志


1970-01-01 08:00:13.561   352-352   iorapd                  pid-352                              W  Reconnect to package manager service: 1 times
1970-01-01 08:00:13.566   352-352   ServiceManager          pid-352                              I  Waiting for service 'package_native' on '/dev/binder'...
1970-01-01 08:00:16.139   309-468   libc++abi               system_server                        E  terminating with uncaught exception of type NSt3__112system_errorE: condition_variable timed_wait failed: Invalid argument
1970-01-01 08:00:16.139   309-468   libc                    system_server                        A  Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 468 (Thread-9), pid 309 (system_server)
1970-01-01 08:00:18.614   352-352   ServiceManager          iorapd                               W  Service package_native didn't start. Returning NULL
1970-01-01 08:00:18.614   352-352   iorapd                  iorapd                               E  Cannot get package manager service!
1970-01-01 08:00:18.860   320-320   PackageManager          system_server                        I  Fix for b/169414761 is applied
1970-01-01 08:00:18.860   320-320   PackageManagerTiming    system_server                        V  create package manager took to complete: 2907ms
1970-01-01 08:00:18.940   320-320   SystemServerTiming      system_server                        V  StartPackageManagerService took to complete: 2989ms
1970-01-01 08:00:19.615   352-352   iorapd                  iorapd                               W  Reconnect to package manager service: 2 times

2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  Build fingerprint: 'WIF/BE/BE:13/TQ2A.230405.003.B2/eng.lxg.20241219.183158:userdebug/dev-keys'
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  Revision: '0'
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  ABI: 'arm'
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  Timestamp: 1970-01-01 08:00:15.955879757+0800
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  Process uptime: 4s
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  Cmdline: system_server
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  pid: 320, tid: 465, name: Thread-9  >>> system_server <<<
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  uid: 1000
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A  Abort message: 'terminating with uncaught exception of type NSt3__112system_errorE: condition_variable timed_wait failed: Invalid argument'
2024-12-19 19:05:33.158   473-473   DEBUG                   crash_dump32                         A      r0  00000000  r1  000001d1  r2  00000006  r3  ba666ff8
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A      r4  ba667008  r5  ba666ff0  r6  00000140  r7  0000016b
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A      r8  00000000  r9  ffffffff  r10 ba666ff8  r11 7fffffff
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A      ip  000001d1  sp  ba666fd8  lr  ea7a6ddf  pc  ea7a6df2
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A  backtrace:
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A        #00 pc 00062df2  /apex/com.android.runtime/lib/bionic/libc.so (abort+138) (BuildId: a2eff9b4b43a846ff70c420d25bbcd39)
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A        #01 pc 0001f359  /apex/com.android.os.statsd/lib/libstatspull.so (abort_message+92) (BuildId: 649822716a4e4e9612385129366d0b3d)
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A        #02 pc 0001f6bd  /apex/com.android.os.statsd/lib/libstatspull.so (demangling_terminate_handler()+120) (BuildId: 649822716a4e4e9612385129366d0b3d)
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A        #03 pc 0001f57f  /apex/com.android.os.statsd/lib/libstatspull.so (std::__terminate(void (*)())+2) (BuildId: 649822716a4e4e9612385129366d0b3d)
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A        #04 pc 0001f52f  /apex/com.android.os.statsd/lib/libstatspull.so (std::terminate()+46) (BuildId: 649822716a4e4e9612385129366d0b3d)
2024-12-19 19:05:33.159   473-473   DEBUG                   crash_dump32                         A        #05 pc 0001f3c9  /apex/com.android.os.statsd/lib/libstatspull.so (__clang_call_terminate+4) (BuildId: 649822716a4e4e9612385129366d0b3d)

2024-12-19 19:05:33.923   352-352   iorapd                  iorapd                               W  Reconnect to package manager service: 1 times
2024-12-19 19:05:33.924   352-352   ServiceManager          iorapd                               I  Waiting for service 'package_native' on '/dev/binder'...
1970-01-01 08:00:20.827   352-352   binder                  iorapd                               I  352:352 transaction failed 29189/-22, size 100-0 line 3188

iorapd 是 I/O 预读取守护进程,负责监测和优化应用启动速度

Google issue

terminating with uncaught exception of type NSt3__112system_errorE

Analyze:

  1. Through log analysis, the reason for the system error crash is: libc++abi: termination with uncaught exception of type NSt3__112system_errorE: condition_variable timed_wait failed: Invalid argument.
  2. Analyze the code and locate the location where the problem occurs:android/frameworks/base/services/incremental/ServiceWrappers.cpp runTimers().
  3. Through analysis and experiments, it is initially believed that the problem caused by the parameter “nextJobTs” passed to “mCondition.wait_until” is too large. We divide the initial value of the “static constexpr TimePoint kInfinityTs” by 2 and the problem no longer occurs.

AOSP源码中寻找解决方案


diff --git a/android/frameworks/base/services/incremental/ServiceWrappers.cpp b/android/frameworks/base/services/incremental/ServiceWrappers.cpp
index ce3d51483d..2a2fa7adf5 100644
--- a/android/frameworks/base/services/incremental/ServiceWrappers.cpp
+++ b/android/frameworks/base/services/incremental/ServiceWrappers.cpp
@@ -315,11 +315,22 @@ private:
         std::unique_lock lock(mMutex);
         for (;;) {
             const TimePoint nextJobTs = mJobs.empty() ? kInfinityTs : mJobs.begin()->when;
-            mCondition.wait_until(lock, nextJobTs, [this, oldNextJobTs = nextJobTs]() {
+            auto conditionPredicate = [this, oldNextJobTs = nextJobTs]() {
                 const auto now = Clock::now();
                 const auto newFirstJobTs = !mJobs.empty() ? mJobs.begin()->when : kInfinityTs;
                 return newFirstJobTs <= now || newFirstJobTs < oldNextJobTs || !mRunning;
-            });
+            };
+            // libcxx's implementation of wait_until() recalculates the 'until' time into
+            // the wait duration and then goes back to the absolute timestamp when calling
+            // pthread_cond_timedwait(); this back-and-forth calculation sometimes loses
+            // the 'infinity' value because enough time passes in between, and instead
+            // passes incorrect timestamp into the syscall, causing a crash.
+            // Mitigating it by explicitly calling the non-timed wait here.
+            if (mJobs.empty()) {
+                mCondition.wait(lock, conditionPredicate);
+            } else {
+                mCondition.wait_until(lock, nextJobTs, conditionPredicate);
+            }
             if (!mRunning) {
                 return;
             }