Extract Data from Files Step

Extracts structured data from documents using AI-powered field extraction. Use this step to automatically pull specific information from uploaded files (e.g., invoices, contracts, forms) based on natural language prompts.

Top-level properties

nametyperequiredconstraintsdescription
filesarrayyesitems: file, minItems: 1Array of file configurations, each defining extraction for a specific document

File

Each file object defines extraction configuration for a document. The extractionMethod property discriminates between manual and automatic extraction modes.

nametyperequiredconstraintsdescription
namestringyesDescriptive name for the document configuration
extractionMethodstringyesenum: MANUAL, AUTOMATICDiscriminator controlling extraction behavior
filestringyessupports template mapping syntax {{StepName.FieldName}}Document file path or reference from previous step
fieldsarrayyesitems: field, minItems: 1List of fields to extract from the document
customInstructionstringnoOptional guidance to improve extraction accuracy
testFilestringnoformat: uuid; only available when extractionMethod is AUTOMATICReference to a test file for validating extraction

Extraction Method: MANUAL

Manual extraction allows explicit field definition without automated field detection.

  • Use when you know exactly what fields to extract
  • Best for structured documents with consistent layouts
  • Provides fine-grained control over extraction logic

Extraction Method: AUTOMATIC

Automatic extraction leverages AI to intelligently extract fields with optional test file support.

  • Use when you want AI-assisted field extraction
  • Supports testing extraction patterns with testFile
  • Validate extraction accuracy against sample documents

Field

Each field defines what data to extract from the document.

nametyperequiredconstraints
idstringyesformat: uuidUnique identifier for the field
namestringyesField name for the extracted data
promptstringyesNatural language instruction describing what to extract

Example (JSON)

Example with manual extraction:

{
  "files": [
    {
      "name": "Invoice Document",
      "extractionMethod": "MANUAL",
      "file": "{{`Upload Step`.`Document`}}",
      "fields": [
        {
          "id": "123e4567-e89b-12d3-a456-426614174000",
          "name": "Invoice Number",
          "prompt": "Extract the invoice number from the document"
        },
        {
          "id": "223e4567-e89b-12d3-a456-426614174001",
          "name": "Total Amount",
          "prompt": "Extract the total amount from the invoice"
        },
        {
          "id": "323e4567-e89b-12d3-a456-426614174002",
          "name": "Invoice Date",
          "prompt": "Extract the invoice date in MM/DD/YYYY format"
        }
      ],
      "customInstruction": "Focus on the header section of the document where invoice details are typically listed"
    }
  ]
}

Example with automatic extraction and test file:

{
  "files": [
    {
      "name": "Resume Document",
      "extractionMethod": "AUTOMATIC",
      "file": "{{`Previous Step`.`Resume`}}",
      "fields": [
        {
          "id": "423e4567-e89b-12d3-a456-426614174003",
          "name": "Candidate Name",
          "prompt": "Extract the candidate full name"
        },
        {
          "id": "523e4567-e89b-12d3-a456-426614174004",
          "name": "Email Address",
          "prompt": "Extract the email address"
        },
        {
          "id": "623e4567-e89b-12d3-a456-426614174005",
          "name": "Years of Experience",
          "prompt": "Extract total years of professional experience"
        }
      ],
      "testFile": "723e4567-e89b-12d3-a456-426614174006",
      "customInstruction": "Use the test file to validate extraction accuracy before processing production documents"
    }
  ]
}

Example with multiple files:

{
  "files": [
    {
      "name": "Invoice Document",
      "extractionMethod": "MANUAL",
      "file": "{{`Upload Step`.`Invoice`}}",
      "fields": [
        {
          "id": "123e4567-e89b-12d3-a456-426614174000",
          "name": "Invoice Number",
          "prompt": "Extract the invoice number from the document"
        },
        {
          "id": "223e4567-e89b-12d3-a456-426614174001",
          "name": "Total Amount",
          "prompt": "Extract the total amount from the invoice"
        }
      ],
      "customInstruction": null
    },
    {
      "name": "Receipt Document",
      "extractionMethod": "AUTOMATIC",
      "file": "{{`Upload Step`.`Receipt`}}",
      "fields": [
        {
          "id": "323e4567-e89b-12d3-a456-426614174002",
          "name": "Vendor Name",
          "prompt": "Extract the vendor or merchant name"
        },
        {
          "id": "423e4567-e89b-12d3-a456-426614174003",
          "name": "Purchase Date",
          "prompt": "Extract the purchase date"
        }
      ],
      "testFile": "523e4567-e89b-12d3-a456-426614174004",
      "customInstruction": null
    }
  ]
}