Extract text from a PDF page-by-page, with optional page-range slicing
and metadata.
Image-only / scanned PDFs surface has_text: false — agents should
detect this and route through ocrPdf for OCR. Lazy-loads
pdfjs-dist (optional dep) so markdown-only users pay zero cost.
Out-of-range pages slice arguments are clamped rather than thrown
(matches Array.prototype.slice semantics).
A ReadPdfResult with per-page text, full-text join,
metadata, and original total_page_count.
Throws
If path is empty, the file is missing or excluded,
or pdfjs-dist is not installed.
Throws
If path resolves outside the vault.
Example
// Read pages 1-5 of a long paper constr=awaitreadPdf(vault, { path: "Papers/2024-rag-survey.pdf", pages: [1, 5], include_metadata: true }); if (!r.has_text) console.log("Scanned PDF — try ocrPdf()"); console.log(r.metadata?.title, r.full_text.slice(0, 200));
Extract text from a PDF page-by-page, with optional page-range slicing and metadata.
Image-only / scanned PDFs surface
has_text: false— agents should detect this and route through ocrPdf for OCR. Lazy-loadspdfjs-dist(optional dep) so markdown-only users pay zero cost. Out-of-rangepagesslice arguments are clamped rather than thrown (matchesArray.prototype.slicesemantics).