Not all agents can read text off a scanned page; this extracts it as layout-aware markdown, tables included.
Extract text from scanned PDFs and images as layout-aware markdown, including tables. Powered by Mistral OCR. Priced per page.
Extract layout-aware text and tables from a scanned PDF or image.
Optional 0-indexed page numbers to process. Defaults to all pages.
Preferred output style. The response always includes per-page markdown.
Pre-signed GET URL of the scanned PDF or image.
curl -X POST "https://skill.askfaro.com/skills/ocr/run" \
-H "Authorization: Bearer faro_<your_key>" \
-H "Content-Type: application/json" \
-d '{
"intent": {
"prompt": "Extract the text from this scanned PDF"
}
}'askfaro describe ocr/extract
Install pip install askfaro-cli, then askfaro auth login.
Turn scanned PDFs and photographed documents into accurate, layout-aware markdown (including tables), using Mistral OCR. Faro proxies the call; your file never touches the Faro worker.
POST /uploads/presign on faro-api, PUT your file, and pass the get_url as input_url.pages array; each page has markdown text. The
number of pages returned is the number of pages billed.1.5 credits per page processed. The response reports the page count, so the charge scales with document length. (The exact response shape is confirmed against the live API once the key is provisioned.)
Use pdf-tools/extract_text for PDFs that already have a text layer (cheaper, no
per-page vendor fee). Use this for scanned or photographed documents, or when you
need high-fidelity tables as markdown.
The default request treats the input as a document URL. Image-only inputs may
require the image_url document type, confirmed at finish-line testing once the
key is provisioned.