openclaw/extensions/memory-core
buyitsydney 4b69c6d3f1 fix(memory): add CJK/Kana/Hangul support to MMR tokenize() for diversity detection
The tokenize() function only matched [a-z0-9_]+ patterns, returning an
empty set for CJK-only text. This made Jaccard similarity always 0 (or
always 1 for two empty sets) for CJK content, effectively disabling MMR
diversity detection.

Add support for:
- CJK Unified Ideographs (U+4E00–U+9FFF, U+3400–U+4DBF)
- Hiragana (U+3040–U+309F) and Katakana (U+30A0–U+30FF)
- Hangul Syllables (U+AC00–U+D7AF) and Jamo (U+1100–U+11FF)

Characters are extracted as unigrams, and bigrams are generated only
from characters that are adjacent in the original text (no spurious
bigrams across ASCII boundaries).

Fixes #28000
2026-03-28 09:19:52 +05:30
..
src fix(memory): add CJK/Kana/Hangul support to MMR tokenize() for diversity detection 2026-03-28 09:19:52 +05:30
api.ts refactor: move bundled plugin policy into manifests 2026-03-27 16:40:27 +00:00
index.test.ts refactor: move memory engine behind plugin adapters 2026-03-27 00:47:01 +00:00
index.ts refactor: move memory engine behind plugin adapters 2026-03-27 00:47:01 +00:00
openclaw.plugin.json refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
package.json chore: bump version metadata to 2026.3.27 2026-03-28 02:00:22 +00:00
runtime-api.ts refactor: move bundled plugin policy into manifests 2026-03-27 16:40:27 +00:00