openclaw/extensions/memory-core/src
buyitsydney 4b69c6d3f1 fix(memory): add CJK/Kana/Hangul support to MMR tokenize() for diversity detection
The tokenize() function only matched [a-z0-9_]+ patterns, returning an
empty set for CJK-only text. This made Jaccard similarity always 0 (or
always 1 for two empty sets) for CJK content, effectively disabling MMR
diversity detection.

Add support for:
- CJK Unified Ideographs (U+4E00–U+9FFF, U+3400–U+4DBF)
- Hiragana (U+3040–U+309F) and Katakana (U+30A0–U+30FF)
- Hangul Syllables (U+AC00–U+D7AF) and Jamo (U+1100–U+11FF)

Characters are extracted as unigrams, and bigrams are generated only
from characters that are adjacent in the original text (no spurious
bigrams across ASCII boundaries).

Fixes #28000
2026-03-28 09:19:52 +05:30
..
memory fix(memory): add CJK/Kana/Hangul support to MMR tokenize() for diversity detection 2026-03-28 09:19:52 +05:30
cli.runtime.ts refactor: remove memory-core runtime barrel 2026-03-27 02:54:23 +00:00
cli.test.ts refactor: remove memory-core runtime barrel 2026-03-27 02:54:23 +00:00
cli.ts refactor: extract memory host sdk package 2026-03-27 02:49:33 +00:00
cli.types.ts
flush-plan.ts refactor: extract memory host sdk package 2026-03-27 02:49:33 +00:00
prompt-section.ts refactor: extract memory host sdk package 2026-03-27 02:49:33 +00:00
runtime-provider.ts refactor: remove memory-core runtime barrel 2026-03-27 02:54:23 +00:00
tools.citations.test.ts
tools.citations.ts refactor: extract memory host sdk package 2026-03-27 02:49:33 +00:00
tools.runtime.ts refactor: remove memory-core runtime barrel 2026-03-27 02:54:23 +00:00
tools.shared.ts refactor: remove memory-core runtime barrel 2026-03-27 02:54:23 +00:00
tools.test-helpers.ts refactor: move bundled plugin policy into manifests 2026-03-27 16:40:27 +00:00
tools.test.ts
tools.ts refactor: extract memory host sdk package 2026-03-27 02:49:33 +00:00