Commit Graph

21 Commits

Author SHA1 Message Date
ImLukeF 78b48735fa
test: restore fetch stubs in embedding suites 2026-04-01 20:53:16 +11:00
ImLukeF 101c31f5e1
test: harden ci-sensitive unit suites 2026-04-01 20:53:16 +11:00
Vincent Koc ab4ddff7f1
feat(memory): add per-agent QMD extra collections for cross-agent session search (#58211)
* feat(memory): add per-agent qmd extra collections

* test(config): cover qmd extra collections schema outputs

* docs(config): refresh qmd extra collections baseline

* docs(config): regenerate qmd extra collections baselines

* docs(config): clarify qmd extra collection naming
2026-03-31 17:08:18 +09:00
Vincent Koc 075645f5cb
fix(memory): use explicit qmd snippet line metadata (#58181)
* fix(memory): preserve qmd snippet line metadata

* Memory/QMD: preserve snippet span with partial line metadata
2026-03-31 17:05:53 +09:00
Altay 910134b702 fix(memory): stabilize qmd collection scoping 2026-03-30 22:41:21 +03:00
Vincent Koc b7de04f23f
fix(memory): preserve shared qmd collection names (#57628)
* fix(memory): preserve shared qmd collection names

* fix(memory): canonicalize qmd path containment
2026-03-30 19:29:35 +09:00
Vincent Koc 8623c28f1d
fix(memory): warn when qmd binary is missing (#57467)
* fix(memory): warn when qmd binary is missing

* fix(memory): avoid probing cached qmd managers

* docs(memory): clarify qmd doctor probe behavior

* fix(memory): probe qmd from agent workspace
2026-03-30 13:44:41 +09:00
Vincent Koc b7d59f7831
fix(memory): export archived qmd session transcripts (#57446)
* fix(memory): export archived qmd session transcripts

* test(memory): separate qmd session listing describe
2026-03-30 12:50:21 +09:00
Gustavo Madeira Santana 313fdf5adf
Memory Host: make backend-config path tests portable 2026-03-29 23:41:42 -04:00
Vincent Koc da35718cb2
fix(memory): add qmd mcporter search tool override (#57363)
* fix(memory): add qmd mcporter search tool override

* fix(memory): tighten qmd search tool override guards

* chore(config): drop generated docs baselines from qmd pr

* fix(memory): keep explicit qmd query override on v2 args

* docs(changelog): normalize qmd search tool attribution

* fix(memory): reuse v1 qmd tool after query fallback
2026-03-30 12:07:32 +09:00
Amine Harch el korane 219d4f03bd
fix: wire memorySearch.extraPaths to QMD indexing (#57315)
* fix: wire memorySearch.extraPaths to QMD indexing

The 'agents.defaults.memorySearch.extraPaths' config field was documented
to add extra directories to the memory index, but the paths were never
actually passed to the QMD backend. Only 'memory.qmd.paths' worked.

This fix reads extraPaths from the memorySearch config and maps them
to QMD custom path collections, so users can simply configure:

  memorySearch:
    extraPaths:
      - odd-vault
      - /Users/odd/workspace
      - /Users/odd/docs

And have those directories indexed alongside the default memory files.

Closes #57302

* fix: handle per-agent memorySearch.extraPaths overrides + add tests

- Read per-agent overrides from agents.list[].memorySearch.extraPaths
- Agent-specific overrides take priority over defaults
- Falls back to defaults when agent has no overrides
- Added 3 test cases for the feature

* fix: merge defaults + agent overrides instead of replacing

* fix: remove any types from tests, fix merge behavior assertion

* fix(memory): merge qmd extra path collections

* fix(memory): normalize qmd extra path resolution

* fix(memory): type qmd extra path merge

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-30 08:58:42 +09:00
Peter Steinberger 8e0ab35b0e
refactor(plugins): decouple bundled plugin runtime loading 2026-03-29 09:10:38 +01:00
Tak Hoffman 3ce48aff66
Memory: add configurable FTS5 tokenizer for CJK text support (openclaw#56707)
Verified:
- pnpm build
- pnpm check
- pnpm test -- extensions/memory-core/src/memory/manager-search.test.ts packages/memory-host-sdk/src/host/query-expansion.test.ts
- pnpm test -- extensions/memory-core/src/memory/index.test.ts -t "reindexes when extraPaths change"
- pnpm test -- src/config/schema.base.generated.test.ts
- pnpm test -- src/media-understanding/image.test.ts
- pnpm test

Co-authored-by: Mitsuyuki Osabe <24588751+carrotRakko@users.noreply.github.com>
2026-03-28 20:53:29 -05:00
AaronLuo00 f8547fcae4 fix: guard fine-split against breaking UTF-16 surrogate pairs
When re-splitting CJK-heavy segments at chunking.tokens, check whether the
slice boundary falls on a high surrogate (0xD800–0xDBFF) and if so extend
by one code unit to keep the pair intact.  Prevents producing broken
surrogate halves for CJK Extension B+ characters (U+20000+).

Add test verifying no lone surrogates appear when splitting lines of
surrogate-pair characters with an odd token budget.

Addresses third-round Codex P2 review comment.
2026-03-29 10:22:43 +09:00
AaronLuo00 3b95aa8804 fix: address second-round review — Latin backward compat and emoji consistency
- Two-pass line splitting: first slice at maxChars (unchanged for Latin),
  then re-split only CJK-heavy segments at chunking.tokens. This preserves
  the original ~800-char segments for ASCII lines while keeping CJK chunks
  within the token budget.

- Narrow surrogate-pair adjustment to CJK Extension B+ range (D840–D87E)
  only, so emoji surrogate pairs are not affected. Mixed CJK+emoji text
  is now handled consistently regardless of composition.

- Add tests: emoji handling (2), Latin backward-compat long-line (1).

Addresses Codex P1 (oversized CJK segments) and P2s (Latin over-splitting,
emoji surrogate inconsistency).
2026-03-29 10:22:43 +09:00
AaronLuo00 a5147d4d88 fix: address bot review — surrogate-pair counting and CJK line splitting
- Use code-point length instead of UTF-16 length in estimateStringChars()
  so that CJK Extension B+ surrogate pairs (U+20000+) are counted as 1
  character, not 2 (fixes ~25% overestimate for rare characters).

- Change long-line split step from maxChars to chunking.tokens so that
  CJK lines are sliced into token-budget-sized segments instead of
  char-budget-sized segments that produce ~4x oversized chunks.

- Add tests for both fixes: surrogate-pair handling and long CJK line
  splitting.

Addresses review feedback from Greptile and Codex bots.
2026-03-29 10:22:43 +09:00
AaronLuo00 971ecabe80 fix(memory): account for CJK characters in QMD memory chunking
The QMD memory system uses a fixed 4:1 chars-to-tokens ratio for chunk
sizing, which severely underestimates CJK (Chinese/Japanese/Korean) text
where each character is roughly 1 token. This causes oversized chunks for
CJK users, degrading vector search quality and wasting context window space.

Changes:
- Add shared src/utils/cjk-chars.ts module with CJK-aware character
  counting (estimateStringChars) and token estimation helpers
- Update chunkMarkdown() in src/memory/internal.ts to use weighted
  character lengths for chunk boundary decisions and overlap calculation
- Replace hardcoded estimateTokensFromChars in the context report
  command with the shared utility
- Add 13 unit tests for the CJK estimation module and 5 new tests for
  CJK-aware memory chunking behavior

Backward compatible: pure ASCII/Latin text behavior is unchanged.

Closes #39965
Related: #40216
2026-03-29 10:22:43 +09:00
Peter Steinberger 4c27c90fc2 refactor: finish moving provider runtime into extensions 2026-03-27 05:38:58 +00:00
Peter Steinberger 64bf80d4d5 refactor: move provider runtime into extensions 2026-03-27 05:38:58 +00:00
Peter Steinberger eebce9e9c7
refactor: move memory host into sdk package 2026-03-27 04:12:04 +00:00
Peter Steinberger bd6c7969ea
refactor: extract memory host sdk package 2026-03-27 02:49:33 +00:00