Document Extraction API Reference

REST API reference for the dataextractor.io document extraction API. Authentication, datasets, extraction, and webhooks.

Last updated: April 8, 2026

Authentication

All API requests require a Bearer token in the Authorization header. Generate a key from the Developer page in your account. Keys are scoped to your account and can be revoked at any time.

List datasets

GET /api/v1/datasets — list all datasets. Filter by customer_id, dataset_type, is_verified, or search query. Returns paginated results.

Get a dataset

GET /api/v1/datasets/{id} — return a single dataset with its extracted fields, line items, and ground truth.

Webhooks

Subscribe to extraction.completed and matching.completed events. Configure a webhook URL from the Integrations page. Payloads include the dataset ID and a signed URL to fetch the extracted data.