Extract Data from Files Step
Extracts structured data from documents using AI-powered field extraction. Use this step to automatically pull specific information from uploaded files (e.g., invoices, contracts, forms) based on natural language prompts.
Top-level properties
| name | type | required | constraints | description |
|---|---|---|---|---|
files | array | yes | items: file, minItems: 1 | Array of file configurations, each defining extraction for a specific document |
File
Each file object defines extraction configuration for a document. The extractionMethod property discriminates between manual and automatic extraction modes.
| name | type | required | constraints | description |
|---|---|---|---|---|
name | string | yes | Descriptive name for the document configuration | |
extractionMethod | string | yes | enum: MANUAL, AUTOMATIC | Discriminator controlling extraction behavior |
file | string | yes | supports template mapping syntax {{StepName.FieldName}} | Document file path or reference from previous step |
fields | array | yes | items: field, minItems: 1 | List of fields to extract from the document |
customInstruction | string | no | Optional guidance to improve extraction accuracy | |
testFile | string | no | format: uuid; only available when extractionMethod is AUTOMATIC | Reference to a test file for validating extraction |
Extraction Method: MANUAL
MANUALManual extraction allows explicit field definition without automated field detection.
- Use when you know exactly what fields to extract
- Best for structured documents with consistent layouts
- Provides fine-grained control over extraction logic
Extraction Method: AUTOMATIC
AUTOMATICAutomatic extraction leverages AI to intelligently extract fields with optional test file support.
- Use when you want AI-assisted field extraction
- Supports testing extraction patterns with
testFile - Validate extraction accuracy against sample documents
Field
Each field defines what data to extract from the document.
| name | type | required | constraints | |
|---|---|---|---|---|
id | string | yes | format: uuid | Unique identifier for the field |
name | string | yes | Field name for the extracted data | |
prompt | string | yes | Natural language instruction describing what to extract |
Example (JSON)
Example with manual extraction:
{
"files": [
{
"name": "Invoice Document",
"extractionMethod": "MANUAL",
"file": "{{`Upload Step`.`Document`}}",
"fields": [
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"name": "Invoice Number",
"prompt": "Extract the invoice number from the document"
},
{
"id": "223e4567-e89b-12d3-a456-426614174001",
"name": "Total Amount",
"prompt": "Extract the total amount from the invoice"
},
{
"id": "323e4567-e89b-12d3-a456-426614174002",
"name": "Invoice Date",
"prompt": "Extract the invoice date in MM/DD/YYYY format"
}
],
"customInstruction": "Focus on the header section of the document where invoice details are typically listed"
}
]
}Example with automatic extraction and test file:
{
"files": [
{
"name": "Resume Document",
"extractionMethod": "AUTOMATIC",
"file": "{{`Previous Step`.`Resume`}}",
"fields": [
{
"id": "423e4567-e89b-12d3-a456-426614174003",
"name": "Candidate Name",
"prompt": "Extract the candidate full name"
},
{
"id": "523e4567-e89b-12d3-a456-426614174004",
"name": "Email Address",
"prompt": "Extract the email address"
},
{
"id": "623e4567-e89b-12d3-a456-426614174005",
"name": "Years of Experience",
"prompt": "Extract total years of professional experience"
}
],
"testFile": "723e4567-e89b-12d3-a456-426614174006",
"customInstruction": "Use the test file to validate extraction accuracy before processing production documents"
}
]
}Example with multiple files:
{
"files": [
{
"name": "Invoice Document",
"extractionMethod": "MANUAL",
"file": "{{`Upload Step`.`Invoice`}}",
"fields": [
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"name": "Invoice Number",
"prompt": "Extract the invoice number from the document"
},
{
"id": "223e4567-e89b-12d3-a456-426614174001",
"name": "Total Amount",
"prompt": "Extract the total amount from the invoice"
}
],
"customInstruction": null
},
{
"name": "Receipt Document",
"extractionMethod": "AUTOMATIC",
"file": "{{`Upload Step`.`Receipt`}}",
"fields": [
{
"id": "323e4567-e89b-12d3-a456-426614174002",
"name": "Vendor Name",
"prompt": "Extract the vendor or merchant name"
},
{
"id": "423e4567-e89b-12d3-a456-426614174003",
"name": "Purchase Date",
"prompt": "Extract the purchase date"
}
],
"testFile": "523e4567-e89b-12d3-a456-426614174004",
"customInstruction": null
}
]
}Updated 15 days ago
