Extract Data from Files Step

Use AI-powered extraction to pull structured data from uploaded documents such as invoices, contracts, or forms.

The Extract Data from Files step uses AI to read uploaded documents and pull out specific fields you define. For each file, you write a natural language prompt describing what to extract, and the step returns the extracted values as mappable fields for later steps.

Files

The step processes one or more file configurations, each targeting a different document.

nametyperequireddescription
namestringyesA descriptive name for this document configuration.
extractionMethodstringyesMANUAL for explicit field definition, AUTOMATIC for AI-assisted extraction.
filestringyesThe document to process. Use a mapping placeholder to reference a file uploaded in a previous step (e.g. {{Upload Step.Document}}).
fieldsarrayyesFields to extract from the document. See Fields below.
customInstructionstringnoOptional guidance to improve extraction accuracy (e.g. "Focus on the header section").
testFilestringnoUUID of a test file for validating extraction. Only available when extractionMethod is AUTOMATIC.

Extraction methods

  • MANUAL: You define each field explicitly. Use this when you know exactly what to extract and the document has a consistent layout.
  • AUTOMATIC: AI assists with field detection. Supports a testFile to validate extraction accuracy before processing real documents.

Fields

Each field tells the step what piece of information to extract.

nametyperequireddescription
idstringyesUnique identifier for this field (UUID format).
namestringyesName for the extracted data point (used for mapping).
promptstringyesNatural language instruction describing what to extract.

Examples

Let's look at manual extraction first. The example below pulls three fields from an invoice uploaded in a previous step.

{
  "files": [
    {
      "name": "Invoice Document",
      "extractionMethod": "MANUAL",
      "file": "{{`Upload Step`.`Document`}}",
      "fields": [
        {
          "id": "123e4567-e89b-12d3-a456-426614174000",
          "name": "Invoice Number",
          "prompt": "Extract the invoice number from the document"
        },
        {
          "id": "223e4567-e89b-12d3-a456-426614174001",
          "name": "Total Amount",
          "prompt": "Extract the total amount from the invoice"
        },
        {
          "id": "323e4567-e89b-12d3-a456-426614174002",
          "name": "Invoice Date",
          "prompt": "Extract the invoice date in MM/DD/YYYY format"
        }
      ],
      "customInstruction": "Focus on the header section of the document where invoice details are typically listed"
    }
  ]
}

Here's the same step using automatic extraction with a test file to validate accuracy before running on real documents.

{
  "files": [
    {
      "name": "Resume Document",
      "extractionMethod": "AUTOMATIC",
      "file": "{{`Upload Step`.`Resume`}}",
      "fields": [
        {
          "id": "423e4567-e89b-12d3-a456-426614174003",
          "name": "Candidate Name",
          "prompt": "Extract the candidate full name"
        },
        {
          "id": "523e4567-e89b-12d3-a456-426614174004",
          "name": "Email Address",
          "prompt": "Extract the email address"
        },
        {
          "id": "623e4567-e89b-12d3-a456-426614174005",
          "name": "Years of Experience",
          "prompt": "Extract total years of professional experience"
        }
      ],
      "testFile": "723e4567-e89b-12d3-a456-426614174006",
      "customInstruction": "Use the work history section to calculate total experience"
    }
  ]
}

You can also process multiple files in a single step. Here's an example that extracts from both an invoice and a receipt simultaneously.

{
  "files": [
    {
      "name": "Invoice Document",
      "extractionMethod": "MANUAL",
      "file": "{{`Upload Step`.`Invoice`}}",
      "fields": [
        {
          "id": "123e4567-e89b-12d3-a456-426614174000",
          "name": "Invoice Number",
          "prompt": "Extract the invoice number from the document"
        },
        {
          "id": "223e4567-e89b-12d3-a456-426614174001",
          "name": "Total Amount",
          "prompt": "Extract the total amount from the invoice"
        }
      ],
      "customInstruction": null
    },
    {
      "name": "Receipt Document",
      "extractionMethod": "AUTOMATIC",
      "file": "{{`Upload Step`.`Receipt`}}",
      "fields": [
        {
          "id": "323e4567-e89b-12d3-a456-426614174002",
          "name": "Vendor Name",
          "prompt": "Extract the vendor or merchant name"
        },
        {
          "id": "423e4567-e89b-12d3-a456-426614174003",
          "name": "Purchase Date",
          "prompt": "Extract the purchase date"
        }
      ],
      "testFile": "523e4567-e89b-12d3-a456-426614174004",
      "customInstruction": null
    }
  ]
}