Extract Data from Files Step
Use AI-powered extraction to pull structured data from uploaded documents such as invoices, contracts, or forms.
The Extract Data from Files step uses AI to read uploaded documents and pull out specific fields you define. For each file, you write a natural language prompt describing what to extract, and the step returns the extracted values as mappable fields for later steps.
Files
The step processes one or more file configurations, each targeting a different document.
| name | type | required | description |
|---|---|---|---|
name | string | yes | A descriptive name for this document configuration. |
extractionMethod | string | yes | MANUAL for explicit field definition, AUTOMATIC for AI-assisted extraction. |
file | string | yes | The document to process. Use a mapping placeholder to reference a file uploaded in a previous step (e.g. {{Upload Step.Document}}). |
fields | array | yes | Fields to extract from the document. See Fields below. |
customInstruction | string | no | Optional guidance to improve extraction accuracy (e.g. "Focus on the header section"). |
testFile | string | no | UUID of a test file for validating extraction. Only available when extractionMethod is AUTOMATIC. |
Extraction methods
MANUAL: You define each field explicitly. Use this when you know exactly what to extract and the document has a consistent layout.AUTOMATIC: AI assists with field detection. Supports atestFileto validate extraction accuracy before processing real documents.
Fields
Each field tells the step what piece of information to extract.
| name | type | required | description |
|---|---|---|---|
id | string | yes | Unique identifier for this field (UUID format). |
name | string | yes | Name for the extracted data point (used for mapping). |
prompt | string | yes | Natural language instruction describing what to extract. |
Examples
Let's look at manual extraction first. The example below pulls three fields from an invoice uploaded in a previous step.
{
"files": [
{
"name": "Invoice Document",
"extractionMethod": "MANUAL",
"file": "{{`Upload Step`.`Document`}}",
"fields": [
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"name": "Invoice Number",
"prompt": "Extract the invoice number from the document"
},
{
"id": "223e4567-e89b-12d3-a456-426614174001",
"name": "Total Amount",
"prompt": "Extract the total amount from the invoice"
},
{
"id": "323e4567-e89b-12d3-a456-426614174002",
"name": "Invoice Date",
"prompt": "Extract the invoice date in MM/DD/YYYY format"
}
],
"customInstruction": "Focus on the header section of the document where invoice details are typically listed"
}
]
}Here's the same step using automatic extraction with a test file to validate accuracy before running on real documents.
{
"files": [
{
"name": "Resume Document",
"extractionMethod": "AUTOMATIC",
"file": "{{`Upload Step`.`Resume`}}",
"fields": [
{
"id": "423e4567-e89b-12d3-a456-426614174003",
"name": "Candidate Name",
"prompt": "Extract the candidate full name"
},
{
"id": "523e4567-e89b-12d3-a456-426614174004",
"name": "Email Address",
"prompt": "Extract the email address"
},
{
"id": "623e4567-e89b-12d3-a456-426614174005",
"name": "Years of Experience",
"prompt": "Extract total years of professional experience"
}
],
"testFile": "723e4567-e89b-12d3-a456-426614174006",
"customInstruction": "Use the work history section to calculate total experience"
}
]
}You can also process multiple files in a single step. Here's an example that extracts from both an invoice and a receipt simultaneously.
{
"files": [
{
"name": "Invoice Document",
"extractionMethod": "MANUAL",
"file": "{{`Upload Step`.`Invoice`}}",
"fields": [
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"name": "Invoice Number",
"prompt": "Extract the invoice number from the document"
},
{
"id": "223e4567-e89b-12d3-a456-426614174001",
"name": "Total Amount",
"prompt": "Extract the total amount from the invoice"
}
],
"customInstruction": null
},
{
"name": "Receipt Document",
"extractionMethod": "AUTOMATIC",
"file": "{{`Upload Step`.`Receipt`}}",
"fields": [
{
"id": "323e4567-e89b-12d3-a456-426614174002",
"name": "Vendor Name",
"prompt": "Extract the vendor or merchant name"
},
{
"id": "423e4567-e89b-12d3-a456-426614174003",
"name": "Purchase Date",
"prompt": "Extract the purchase date"
}
],
"testFile": "523e4567-e89b-12d3-a456-426614174004",
"customInstruction": null
}
]
}Updated 20 days ago
