AI Models for Insurance Claims Intake: Extracting Facts From Photos, Emails, and Forms

Claims teams are trying to turn messy submissions into usable intake files without making adjusters hunt through every email, photo, form, and estimate by hand. The hard part is not finding the “smartest” model. It is designing a claim submission workflow that extracts facts, shows where each fact came from, asks better follow-up questions, and keeps the final judgment in human hands.

Last reviewed: 2026-04-23. The provider capabilities referenced below change frequently, so verify the linked sources before quoting them in a contract, RFP, or cost plan.

A new loss file usually starts with scattered evidence: a first notice of loss form, claimant emails, smartphone photos, repair estimates, receipts, police reports, adjuster notes, and policy identifiers. The AI system should turn that material into a structured intake draft, not a coverage decision, and every extracted field should carry a source reference that a reviewer can open.

Extract The Facts, Not The Final Decision

The first AI job is fact organization: date of loss, location, policy number, named insured, claimant contact, damaged property, uploaded documents, visible damage, stated sequence of events, and missing fields. That is different from deciding coverage, liability, fraud, reserve amount, or denial language.

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, adopted on December 4, 2023, is a useful boundary marker because it names claim management and fraud detection as insurance lifecycle uses of AI and says regulators may ask for governance and documentation.[1] For intake design, that means the output should be inspectable: what fact was extracted, where it came from, what confidence was assigned, and what human or rules-based step used it.

This matters in a real FNOL automation flow because the first draft often becomes the adjuster’s starting point. A practical schema should include claim_id, evidence_item_id, fact_type, fact_value, source_type, source_location, claimant_statement, model_confidence, needs_reviewer, and missing_information. If the system says “rear bumper damage,” the source should be the photo ID and crop or attachment page, not a free-floating summary sentence.

Use provider features for structure, not for authority. OpenAI function calling and Structured Outputs can constrain responses to a JSON schema, and Anthropic documents tool use for structured calls.[2][3][4] Those features reduce malformed output, but they do not make the extracted fact true.

Give every unit of work a stable ID before any asynchronous processing begins. In backlog runs, file order is a bad join key: attachments are retried, batches complete out of order, and one failed record can otherwise shift the whole reconciliation step. Use request-level IDs such as custom_id or recordId so every result can be joined back to the right claim and evidence item.

Model routing should come before model preference. Use AI Models to compare modalities, context window sizes, public benchmark scores, and pricing signals, then test finalists against your own redacted claim packets. Public scores help shortlist tools; they do not measure whether a system correctly separates a claimant statement from a repair estimate.

Claim workflow momentRecommended routeEngineering rule
Claimant is waiting in a portal or chat for the next question.Synchronous text-and-image call.Do not batch. Return a short, source-aware next question and keep the full extraction draft for reviewer view.
Nightly backlog has thousands of claim emails and attachments that adjusters will review tomorrow.Batch extraction.Use batch only when the operating window allows delayed completion. Reconcile every output by ID before it touches the claim file.[5][6]
One shared prompt prefix contains policy wording, extraction rules, and examples that repeat across many files.Prompt or context caching.Cache stable instructions, but keep claim-specific evidence outside the reusable prefix so source review remains clean.
A cloud workflow already stores evidence in governed buckets and can wait for large asynchronous jobs.Cloud batch inference.Keep storage, region, retention, and access-control rules in the same architecture review as model quality.[7][8][9]
The intake assistant is producing customer-facing copy.Synchronous call with approved templates.Let the system identify the gap, but let reviewed wording control deadlines, promises, escalation language, and disclaimers.

Use Photos And Documents Carefully

Images and attachments can contain high-value intake facts, but the system should describe observable evidence rather than infer the claim outcome. “Photo shows cracked windshield” is an intake fact. “Road debris caused the crack” is a causal conclusion unless another source supports it.

Vision-capable workflows are now normal across major providers, but each route has different operational constraints. The architecture question is whether the evidence needs a live answer, a batch answer, or a reviewer-only draft. For adjuster intake, that decision usually matters more than the brand name on the endpoint.

  • Cite the source for each fact. Example: “date_of_loss = 2026-04-16” should point to the FNOL form field, email sentence, or adjuster note that supplied it.
  • Separate claimant statements from attachment evidence. Example: “claimant says water entered through roof” and “contractor estimate lists ceiling drywall replacement” should be two facts with two sources.
  • Flag unclear media instead of guessing. Example: if a VIN, license plate, serial number, receipt total, or repair line item is cut off, the intake output should request a retake or a cleaner document.
  • Keep page-level and image-level references. Example: “police report attached” is weaker than “police report, page 2, incident date field.”

For PDFs, emails, and scans, the safest pattern is a two-pass intake. First, classify document type and split evidence items: FNOL form, estimate, invoice, receipt, police report, photo set, adjuster note. Second, extract facts from each item into the same schema. That keeps a repair estimate from being treated like a claimant statement and makes reviewer overrides easier to audit.

Common failure modes are surprisingly plain. A roof claim may turn “water stain visible in bedroom ceiling photo” into “storm damage confirmed.” A vehicle estimate may pull the shop address as the loss location. A claimant email that says “my neighbor thinks the pipe froze” may be stored as an observed cause instead of a third-party statement. These errors are not solved by a larger context window; they are solved by better evidence labels, stricter source typing, and reviewer-visible citations.

Do not hide provider storage rules from the architecture review. Some batch services retain request and response data for operational windows, and claim packets can contain personal, medical, financial, and location data. Route production submissions only through providers, regions, and retention terms your legal and security teams have approved.[6]

Create Better Follow-Up Questions

A strong intake workflow does more than summarize what arrived. It should turn missing fields into precise follow-up questions ranked by claim progress. “Please upload any missing documents” is not useful. “Please upload a photo of the vehicle’s rear-left quarter panel and the repair estimate page that shows parts and labor” is useful.

Use a three-level missing-information queue. Level 1 blocks assignment, such as missing policy number, date of loss, claimant contact, or loss location. Level 2 blocks estimate review, such as missing repair estimate, proof of ownership, police report, or extra photos. Level 3 improves quality but should not slow the file, such as optional narrative detail that an adjuster can collect later.

The system should produce follow-up questions as structured data, not prose alone. A good object includes question_text, recipient, reason, blocking_level, source_gap, and evidence_needed. If a roof damage claim includes ceiling photos but no date when the leak was first noticed, the generated question should ask for that date and explain that it is needed to complete the loss timeline.

Batch is useful for follow-up generation when the answer is not needed immediately. For example, a nightly job can read all newly uploaded packets, generate missing-information tasks, and send only high-confidence questions to the adjuster queue. Synchronous calls are better when the claimant is still in the intake session and the next question can prevent another email cycle.

In review logs, the most useful override patterns are not just “right” or “wrong.” Track whether the adjuster changed the recipient, downgraded the blocking level, merged duplicate questions, rejected a question as insensitive, or added a missing evidence type. Those actions tell you whether the extraction failed, the workflow ranking failed, or the customer-facing wording failed.

Protect The Customer Experience

A claim usually arrives after a loss, so intake copy has to be useful without sounding evasive. The system can draft plain-language status text, but a human-reviewed template should control promises, deadlines, denial language, and escalation wording.

Reviewer control is part of the customer experience. The adjuster should see the extracted fact, the source, and the reason a follow-up question was proposed. If the reviewer changes a fact or suppresses a question, store that override as evaluation data for the next comparison.

Public benchmarks are weak proxies for claims extraction because they do not test source match rate, claimant-versus-document separation, or whether a missing-information request would waste a customer’s time. Build a domain eval set from redacted claim packets instead. Include easy files, messy files, duplicates, low-quality photos, conflicting dates, partial estimates, and emails where the claimant repeats something another person told them.

A compact eval framework can start with 50 to 100 redacted files across auto, property, and injury-adjacent scenarios. Score exact field accuracy, source citation accuracy, false missing-information requests, reviewer override rate, JSON/schema failure rate, latency by route, batch expiration or error rate, and total cost per completed claim packet. Add a separate “harmful question” check for wording that sounds accusatory, asks for already supplied evidence, or implies a coverage position too early.

A practical launch gate is simple: no extracted fact enters the claim system without a source reference, no low-confidence fact skips reviewer review, no generated question is sent to a claimant without policy-approved wording, and no asynchronous output is accepted unless every result is reconciled by ID.

Decision Framework

Use live calls when the answer changes the claimant’s next screen or the adjuster’s current review. Use batch when the work is offline and can be reconciled before tomorrow’s queue. Use caching for stable instructions, not for claim-specific facts. Above all, treat document extraction as evidence review: every field needs a source, every uncertain field needs a reviewer path, and every follow-up question needs a clear reason.

FAQ

What is the best AI model for insurance claims intake?
The best choice is the smallest reliable route that supports the needed documents, photos, schema control, source citations, and review workflow. Test it on redacted claim packets before comparing price.

When should FNOL automation use batch instead of live calls?
Use batch when no claimant or adjuster is waiting and a delayed completion window is acceptable. Use live calls when the output changes the next question, next screen, or current review step.

Can AI decide coverage or liability?
Do not make that the intake job. Intake systems should extract facts, cite sources, identify conflicts, and route gaps. Coverage, liability, fraud handling, and denial language need approved rules, trained staff, and legal or compliance review.

Sources

  1. NAIC Model Bulletin on AI use by insurers: https://content.naic.org/index.php/article/naic-members-approve-model-bulletin-use-ai-insurers
  2. OpenAI function calling guide: https://platform.openai.com/docs/guides/function-calling
  3. OpenAI Structured Outputs guide: https://platform.openai.com/docs/guides/structured-outputs
  4. Anthropic tool use documentation: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
  5. OpenAI Batch API documentation: https://platform.openai.com/docs/guides/batch
  6. Anthropic Message Batches documentation: https://docs.anthropic.com/en/docs/build-with-claude/batch-processing
  7. Vertex AI Gemini batch inference documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini
  8. Amazon Bedrock batch inference documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html
  9. Microsoft Azure OpenAI batch documentation: https://learn.microsoft.com/azure/ai-services/openai/how-to/batch