Getting Started with Document Extraction

Quick start guide to extracting structured data from PDFs, invoices, and purchase orders with dataextractor.io.

Last updated: April 8, 2026

Sign up & login

Create a free account to get started. The Free plan includes 10 documents per month and the full extraction pipeline — no credit card required.

Upload your first document

From the Extractor page, upload a PDF (invoice, purchase order, receipt, statement). Files up to 25MB are supported. You can also paste a URL to a public PDF.

AI detects the schema

The classifier inspects the document, identifies the document type, and proposes an extraction schema with the fields it expects to find. Review the suggested fields and approve, edit, or remove them.

Extract values

Run extraction. Each field is extracted with a per-field prompt and a bounding box on the source PDF. Review the values, correct anything wrong, and save as ground truth.

Improve with the GEPA loop

Click Improve to run the GEPA learning loop. The agent compares extraction against ground truth, rewrites the per-field prompts to fix the errors, and re-extracts. Repeat until accuracy is where you need it.