Fixes#46185.
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts
Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
Verified:
- pnpm test -- extensions/microsoft/speech-provider.test.ts extensions/microsoft/tts.test.ts
Notes:
- Rebases and refactor-port completed onto current main.
- No required GitHub checks were reported for this branch at merge time.
Co-authored-by: Extra Small <littleshuai.bot@gmail.com>
Verified:
- ui: pnpm test -- --run src/ui/markdown.test.ts
- local full gate relaxed for this run; no required GitHub checks reported on the branch
Co-authored-by: jnuyao <2928523+jnuyao@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
When re-splitting CJK-heavy segments at chunking.tokens, check whether the
slice boundary falls on a high surrogate (0xD800–0xDBFF) and if so extend
by one code unit to keep the pair intact. Prevents producing broken
surrogate halves for CJK Extension B+ characters (U+20000+).
Add test verifying no lone surrogates appear when splitting lines of
surrogate-pair characters with an odd token budget.
Addresses third-round Codex P2 review comment.
- Two-pass line splitting: first slice at maxChars (unchanged for Latin),
then re-split only CJK-heavy segments at chunking.tokens. This preserves
the original ~800-char segments for ASCII lines while keeping CJK chunks
within the token budget.
- Narrow surrogate-pair adjustment to CJK Extension B+ range (D840–D87E)
only, so emoji surrogate pairs are not affected. Mixed CJK+emoji text
is now handled consistently regardless of composition.
- Add tests: emoji handling (2), Latin backward-compat long-line (1).
Addresses Codex P1 (oversized CJK segments) and P2s (Latin over-splitting,
emoji surrogate inconsistency).
- Use code-point length instead of UTF-16 length in estimateStringChars()
so that CJK Extension B+ surrogate pairs (U+20000+) are counted as 1
character, not 2 (fixes ~25% overestimate for rare characters).
- Change long-line split step from maxChars to chunking.tokens so that
CJK lines are sliced into token-budget-sized segments instead of
char-budget-sized segments that produce ~4x oversized chunks.
- Add tests for both fixes: surrogate-pair handling and long CJK line
splitting.
Addresses review feedback from Greptile and Codex bots.
The QMD memory system uses a fixed 4:1 chars-to-tokens ratio for chunk
sizing, which severely underestimates CJK (Chinese/Japanese/Korean) text
where each character is roughly 1 token. This causes oversized chunks for
CJK users, degrading vector search quality and wasting context window space.
Changes:
- Add shared src/utils/cjk-chars.ts module with CJK-aware character
counting (estimateStringChars) and token estimation helpers
- Update chunkMarkdown() in src/memory/internal.ts to use weighted
character lengths for chunk boundary decisions and overlap calculation
- Replace hardcoded estimateTokensFromChars in the context report
command with the shared utility
- Add 13 unit tests for the CJK estimation module and 5 new tests for
CJK-aware memory chunking behavior
Backward compatible: pure ASCII/Latin text behavior is unchanged.
Closes#39965
Related: #40216
Webhook channels (LINE, Zalo, Nextcloud Talk, BlueBubbles) are
incorrectly flagged as stale-socket during quiet periods because
snapshot.mode is always undefined, making the mode !== "webhook"
guard in evaluateChannelHealth dead code.
Add mode: "webhook" to each webhook plugin's describeAccount and
propagate described.mode in getRuntimeSnapshot.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
collectStatusIssues previously checked account.channelAccessToken directly,
but this field is stripped by projectSafeChannelAccountSnapshotFields for
security. This caused 'openclaw status' to always report WARN even when the
token is valid and the LINE provider starts successfully.
Use account.configured instead, which is already computed by
buildChannelAccountSnapshot and correctly reflects whether credentials
are present. This is consistent with how other channels (e.g. Telegram)
implement their status checks.
Fixes#45693
Pad both buffers to equal length before constant-time comparison.
Key fix: call timingSafeEqual unconditionally and store the result
before the && length check, ensuring the constant-time comparison
always runs regardless of buffer lengths. This avoids JavaScript's
&& short-circuit evaluation which would skip timingSafeEqual on
length mismatches, preserving the timing side-channel.
Changes:
- Pad hash and signature buffers to maxLen before comparison
- Store timingSafeEqual result before combining with length check
- Add explanatory comment about the short-circuit avoidance