Getting Started with Document Extraction
Quick start guide to extracting structured data from PDFs, invoices, and purchase orders with dataextractor.io.
Last updated: April 8, 2026
Sign up & login
Create a free account to get started. The Free plan includes 10 documents per month and the full extraction pipeline — no credit card required.
Upload your first document
From the Extractor page, upload a PDF (invoice, purchase order, receipt, statement). Files up to 25MB are supported. You can also paste a URL to a public PDF.
AI detects the schema
The classifier inspects the document, identifies the document type, and proposes an extraction schema with the fields it expects to find. Review the suggested fields and approve, edit, or remove them.
Extract values
Run extraction. Each field is extracted with a per-field prompt and a bounding box on the source PDF. Review the values, correct anything wrong, and save as ground truth.
Improve with the GEPA loop
Click Improve to run the GEPA learning loop. The agent compares extraction against ground truth, rewrites the per-field prompts to fix the errors, and re-extracts. Repeat until accuracy is where you need it.