mirror of https://github.com/openclaw/openclaw.git
When re-splitting CJK-heavy segments at chunking.tokens, check whether the slice boundary falls on a high surrogate (0xD800–0xDBFF) and if so extend by one code unit to keep the pair intact. Prevents producing broken surrogate halves for CJK Extension B+ characters (U+20000+). Add test verifying no lone surrogates appear when splitting lines of surrogate-pair characters with an odd token budget. Addresses third-round Codex P2 review comment. |
||
|---|---|---|
| .. | ||
| clawdbot | ||
| memory-host-sdk | ||
| moltbot | ||