Context
A consumer social app with more than 1M weekly active users was preparing a major release. Minutes before submission, monitoring showed a sharp increase in crashes on the latest build. Release risk was high, and the team needed a safe answer fast.
Symptoms
Crash-free users dropped. Launch time regressed on older devices. Internal reports looked noisy, but the user-facing impact was clear enough to stop the release.
EXC_BAD_ACCESSclustered around media handling.- Crash reports were more common on low-memory devices.
- Launch regressions appeared after a recent image pipeline change.
Investigation
We correlated the crash spike with recent changes and narrowed the issue to an unsafe decode path introduced to improve feed performance.
Primary signals:
- crash spike aligned with build
4.18.0 - image decoding happened on a background queue with weak lifecycle guards
- memory pressure caused the decode path to race the launch sequence
Decision
We did not roll back the entire release. Instead, we scoped a smaller fix that removed the unstable path, restored safe defaults, and deferred the performance experiment to a later build.
Decision drivers:
- high user impact and visible launch instability
- root cause was isolated enough to fix without large code churn
- release window was tight, so low-risk changes mattered more than broad cleanup
Fix
The hotfix focused on safety first:
- replaced unsafe decode with incremental decoding
- added memory warning handling and cache limits
- moved non-critical media work out of the launch path
- added breadcrumbs to improve future crash triage
let options: [CFString: Any] = [
kCGImageSourceShouldCache: false,
kCGImageSourceShouldCacheImmediately: false
]
Outcome
Within 72 hours, the hotfix shipped. Stability recovered and performance improved enough to restore confidence.
- crash-free users returned to
98.6% - launch time improved from
2.31sto1.42s - the team shipped without widening the change set
Lessons
The fastest fix is not always the safest one. Clear signals, tight scope, and business-aware prioritization gave the team a better result than a full rollback or a rushed rewrite.