Function buildEmbedText

buildEmbedText(
    chunks: readonly { breadcrumb?: string; text: string }[],
    i: number,
    opts: { contextChars: number; docTitle?: string },
): string
v2.15.0 — context-prefixed embedding text builder ("late-chunking-style" context windowing). Pre-pends the document title + heading breadcrumb, then includes a tail of the previous chunk + the chunk itself + a head of the next chunk, all bounded so the multilingual model's 128-token context budget isn't blown.

Why: short standalone chunks ("Use Adam β=0.9, β=0.999") embed identically across documents, losing the surrounding context that disambiguates them. Adding ~50-100 chars of neighbor text + the doc title + breadcrumb gives the bi-encoder enough signal to keep cross-document semantic separation. Per Chroma 2024 + Jina AI's late chunking blog: +2-5 NDCG@10 typical at zero new dep cost.

Returns the concatenated text. When contextChars ≤ 0, returns the legacy v2.1.0 form (just breadcrumb + chunk text), preserving bit-for-bit behavior for users who don't opt in.

v3.8.0-rc.6 ARCH-1 — moved here from server.ts to break circular import.
Parameters
- chunks: readonly { breadcrumb?: string; text: string }[]
- i: number
- opts: { contextChars: number; docTitle?: string }
Returns string
- Defined in embed-pipeline.ts:66