1. [P1] Treat remap failures as resume failures — if replaceSubagentRunAfterSteer
returns false, do NOT clear abortedLastRun, increment failed count.
2. [P2] Count scan-level exceptions as retryable failures — set result.failed > 0
in the outer catch block so scheduleOrphanRecovery retry logic triggers.
3. [P2] Persist resumed-session dedupe across recovery retries — accept
resumedSessionKeys as a parameter; scheduleOrphanRecovery lifts the Set to
its own scope and passes it through retries.
4. [Greptile] Use typed config accessors instead of raw structural cast for TLS
check in lifecycle.ts.
5. [Greptile] Forward gateway.reload.deferralTimeoutMs to deferGatewayRestartUntilIdle
in scheduleGatewaySigusr1Restart so user-configured value is not silently ignored.
6. [Greptile] Same as #4 — already addressed by the typed config fix.
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Addresses Codex review feedback — if recovery fails (e.g. gateway
still booting), retries up to 3 times with exponential backoff
(5s → 10s → 20s) before giving up.
- Remove unrelated pnpm-lock.yaml changes
- Move abortedLastRun flag clearing to AFTER successful resume
(prevents permanent session loss on transient gateway failures)
- Use dynamic import for orphan recovery module to avoid startup
memory overhead
- Add test assertion that flag is preserved on resume failure
Closes#47711
After a SIGUSR1 gateway reload aborts in-flight subagent LLM calls, the gateway now scans for orphaned sessions and sends a synthetic resume message to restart their work. Also makes the deferral timeout configurable via gateway.reload.deferralTimeoutMs (default: 5 minutes, up from 90s).
Some Windows locales/versions emit 'Last Result' instead of 'Last Run Result' in schtasks output, causing gateway status to falsely report 'Runtime: unknown'. Fall back to the shorter key when the canonical key is absent.