dataextractor.io
FeaturesPricingDocsIntegrations
LoginSign Up
Home/Docs/Integrations

Integrations Guide

Connect dataextractor.io with SAP, Salesforce, Odoo, NetSuite, and custom HTTP endpoints.

Last updated: April 13, 2026

ERP connectors

Pre-built connectors are available for SAP S/4HANA, SAP Business One, Salesforce, Odoo, and Microsoft Dynamics. NetSuite is in beta. Each connector handles authentication, data mapping, and bidirectional sync — extracted invoice or PO data is pushed into the ERP without a custom integration layer.

Connectors are configured from the Integrations page in your account. Each connector requires the ERP endpoint URL and credentials (API key or OAuth). Once configured, extractions are automatically synced to the ERP after human review and approval in the Review & Edit panel.

For ERPs not on the pre-built list, use the Custom HTTP connector to push extracted data to any REST endpoint. Define the payload shape using a JSON template with field name placeholders, and the connector substitutes the extracted values at sync time.

Webhooks

Webhooks push real-time notifications to your system when extraction events occur. Supported event types are extraction.completed, extraction.failed, matching.completed, and ground_truth.saved.

Webhook payloads are signed with HMAC-SHA256. Every request includes an X-Dataextractor-Signature header with the hex-encoded HMAC of the raw body, computed using your webhook secret. Verify this signature on every incoming request before processing the payload — this prevents replay attacks and ensures the event came from dataextractor.io.

Webhook delivery is retried up to five times with exponential backoff if your endpoint returns a non-2xx status code or times out after 10 seconds. Use the event_id field in the payload as an idempotency key to handle retries safely. Configure webhook endpoints from the Integrations page — you can register multiple endpoints and filter which event types each one receives.

REST API

The REST API provides programmatic access to all extraction and management functionality. Authenticate with a Bearer token from the Developer page. All endpoints return JSON and use standard HTTP status codes.

Common integration patterns: poll for results by posting a document upload, storing the returned dataset ID, then calling GET /api/v1/datasets/{id} every few seconds until status is complete. Use webhooks for event-driven architectures instead of polling — the webhook payload includes the dataset ID, so you fetch the full dataset on receipt. For bulk ingestion, upload documents in batches using the customer_id parameter to tag each document to its source, then retrieve all results for a customer in a single list call.

See the API Reference for the complete endpoint list, all query parameters, request and response schemas, and error codes.

ERP catalog matching deep dive

ERP catalog matching runs after extraction and compares extracted line item data — SKU codes, product descriptions, unit prices — against your ERP catalog to identify the canonical record for each extracted line.

Matching uses fuzzy string comparison, not exact equality. A supplier invoice that lists "Blue Widget 500ml" will match against the ERP record "Widget, Blue, 500mL" despite differences in capitalisation, word order, and punctuation. The matching confidence score reflects how closely the extracted text matches the catalog entry. High-confidence matches are accepted automatically; low-confidence matches are flagged for human review before the line is posted to the ERP.

To improve matching accuracy, sync your ERP catalog regularly from the Integrations page. Catalogs with complete records — including alternative descriptions and known supplier aliases — produce significantly higher automatic match rates. You can add manual aliases for frequently mismatched products without modifying the ERP record itself.

Authentication & security

All API communication is encrypted in transit using HTTPS. API keys are hashed before storage — we cannot recover a key if it is lost, only revoke it and issue a replacement.

Webhook endpoints must use HTTPS. HTTP endpoints are blocked in production but permitted in test mode for local development. For ERP connectors, credentials are encrypted at rest using AES-256. Connector credentials are never returned in API responses — they can only be updated or deleted, not retrieved.

If your ERP or internal webhook endpoint requires IP allowlisting, contact us for the list of dataextractor.io outbound IP addresses. These addresses are stable and change only with prior notice to affected customers.

Testing your integration

Before going live, verify your integration end-to-end in a staging environment.

For webhook integrations, use a tool like webhook.site or ngrok to expose a local endpoint and inspect the raw payloads. Trigger a test extraction from the UI and confirm the webhook fires with the correct event type and dataset ID. Check that your signature verification logic accepts the payload before writing any downstream processing.

For ERP connector integrations, use a document you have already reviewed and verified. Push the extraction to your ERP staging environment and confirm the field mapping is correct before enabling the connector against production. Pay particular attention to currency fields — ensure the numeric format and currency code match what your ERP expects.

For API integrations, the test API key available from the Developer page operates against a sandboxed environment with a fixed set of test documents. Use the test key for automated integration tests so that test runs do not consume production extraction credits.

← Back to all docs
dataextractor.io

Extract structured data from any document format, powered by AI.

Product

  • Features
  • Pricing
  • Integrations

Resources

  • Docs
  • API Reference
  • GitHub

Company

  • About
  • Contact
  • Privacy

© 2026 dataextractor.io. Built with Claude AI.