15 production AI agents every small team should build this quarter. Stack, cost model, payback window, and the two canonical agents we would ship first.

SUMMARY

Fifteen AI agents we see small teams ship and stick with in the first 90 days. Every one is in production at a ScubaDev client or in the Ideas library. Every one earns its retainer line item.

We grouped them: support agents that deflect ticket volume, ops and research agents that compress engineering time, and revenue and content agents that create new pipeline. Each comes with a stack opinion, a payback window, and the gotchas we have hit.

If you are picking two to ship first, jump to section five. The pair we recommend feeds the next three with data.

Below the surface

F

Fifteen production AI agents. Not fifteen demos. Not fifteen ideas on a slide. Fifteen patterns we have shipped, watched run, and replaced when the model layer shifted. They are grouped by where the load lives in a small team. Pick the two that touch your worst bottleneck and start there.

Fifteen AI agents grouped by team function: support agents, ops and research agents, and revenue and content agents.
The fifteen agents grouped by where the work lives in a small team
Indexed text for the fifteen AI agents intro infographic. Regeneration status: Level 3 regeneration required: visible LLM labels lag the May 2026 fact registry. ScubaDev Insights. Automate. AI Automation Agency: What It Actually Does in 2026. The category explained. Pricing. Real vs fake. 2026. Tier 01. Build, discrete engagements. $8K to $60K. 4 to 12 weeks. Fixed scope. Trigger, process, tools, memory, output. Single workflow builds. Multi-step agents. Voice and support ops. Tier 02. Run, operating retainer. $3K to $15K per month. Ongoing. Input, process, output, observability. Prompt iteration. Token budget. API upkeep. Observability. Tier 03. Advise, fractional AI lead. $5K to $15K per month. 3 to 12 months. You, build, buy, partner. Architecture reviews. Buy-vs-build. Vendor selection. SOC 2 posture. The five agents we ship most. 01. In-app AI help reads your docs and changelog. 02. Voice-first booking picks up the phone, reads calendar, books slot. 03. Brand mention and sentiment daily summary escalates what matters. 04. Inbox triage drafts replies in your brand voice. 05. Knowledge gap clusterer turns unanswered tickets into articles. Current tech stack. Orchestration: n8n, Zapier, Make. LLMs: GPT-5.5, Claude Opus 4.7, Claude Sonnet 4.6, Llama or Qwen. Agents: direct API calls, MCP tools, typed tools, small workflow-specific agents. Voice: Retell AI, Vapi, Twilio. Observability: Langfuse, Sentry. Version control: Git, pull requests on prompts. LLM intelligence versus price. Reasoning depth from low to high. Cost per million tokens from low to high. Plotted examples: Llama, Qwen, GPT-5.5, Gemini, Claude Opus 4.7. ScubaDev.com. ScubaDev Insights.

By the numbers

The fifteen agent map at a glance

  • Agents in this guide

    15

    Production patterns we ship and operate at ScubaDev clients.

  • Agents to ship first

    2

    The pair that compounds: in-app AI help plus inbox triage.

  • Time to first version

    3 weeks

    The fastest agents in this list reach a reviewed first version inside three weeks.

  • Month-one ROI

    30 days

    The window where the right agent should already pay back its build cost.

02 / How to read

How to read this list

Each agent below carries the same three labels. Use them to spot the right next move for your team.

  • 01

    Effort

    Weeks of engineering work for a typical small team. Momentum tier means 2 to 4 weeks. Deep end means 4 to 10 weeks. Anything beyond that and the agent should be split into two smaller wins.

  • 02

    Payback window

    How fast the agent pays for itself in reduced headcount or faster revenue. Anything over six months we flag. Most agents on this list pay back inside thirty days, and the ones that do not have a structural reason.

  • 03

    Stack bias

    Our opinion on what to build with. Most of these run on n8n plus an LLM API plus a small database. The exceptions are agents that need real-time voice, custom UI, or warehouse-scale data access.

03 / Common pattern

What they all have in common

01

Bounded data

Every agent on this list reads from a defined corpus, not the open web. Product docs, your past sent emails, your team calendar, your warehouse schema. Bounded data is what keeps token cost predictable and hallucinations rare.

02

Human in the loop

Every agent escalates. None of them auto-execute the high stakes call. Human judgment is the moat, the agent runs the routine ninety percent. The agents that try to remove the human entirely are the ones that ship and silently break.

03

Token discipline

Two stage classification. Cheap model for triage, capable model for the judgment call. Daily token cost alerts. Rollback path on prompt regressions. The agencies and teams that ship these patterns budget for tokens before they budget for features.

04 / Group A

Support agents

The first seven agents reduce the recurring load that keeps small teams buried: support tickets, phones, mentions, inboxes, refunds, and post-purchase questions.

  1. 01 / In-app AI help that reads your docs

    Users ask questions in the app. The AI answers from your actual product docs. Unanswered questions flow to support.

    EffortMomentum, 2 to 3 weeks
    PaybackUnder 30 days
    StackRetrieval over docs + LLM
    In-app AI help that reads your docs

    No new auth surface, bounded token budget, the docs already exist in a format the model can ingest, and the ROI shows up in support tickets the next week. For any SaaS with more than 100 users and real documentation, this is agent number one in any serious ai agent development program. We run this pattern on seven client accounts and the average support ticket reduction in the first 90 days is 40 to 60 percent.

    Build notes. Use a retrieval augmented approach over your docs, not fine tuning. Cheap model for question classification, capable model for the answer. Log every unanswered question. That log is the fuel for agent number five below.

    Read the full idea: In-app AI help that reads your docs →

  2. 02 / Voice-first booking agent

    Answers the phone, reads team calendars, books the right service with the right person, and sends a confirmation text.

    EffortDepth, 4 to 6 weeks
    Payback30 to 60 days
    StackRetell or Vapi + Twilio + Claude
    Voice-first booking agent

    This is the Mermaid Phone pattern. The stack is Retell AI or Vapi for voice, Twilio for telephony, Claude for reasoning, and whatever calendar the client already uses. For any service business taking calls, salons, med spas, home service, small healthcare, this is the highest ROI agent in the list because it replaces a part time receptionist role.

    Build notes. Latency is the make or break metric. Under 1.5 seconds between caller speech and agent response is where callers stop noticing the agent is AI. Budget engineering time for latency tuning, not feature breadth.

    Read the full idea: Voice-first booking agent →

  3. 03 / Brand mention agent with sentiment

    Watches the web and social for your brand. Summarizes daily, flags negative mentions for response.

    EffortMomentum, 2 to 3 weeks
    Payback60 to 90 days
    Stackn8n + custom classifier
    Brand mention agent with sentiment

    The trick is not the monitoring. The trick is the sentiment classifier that flags only what matters. Generic sentiment tools flag too many false positives. A brand specific classifier, trained on your past incidents, is the ROI.

    Build notes. Start by labeling 100 past brand mentions as ignore, respond, or escalate. Use the labeled set as your classifier prompt. Log the classifier misses and relabel weekly for the first month.

    Read the full idea: Brand mention agent with sentiment →

  4. 04 / Inbox triage agent

    Reads every new message, drafts the reply in your voice, files the rest, flags only what needs you.

    EffortMomentum, 2 to 3 weeks
    PaybackUnder 30 days
    StackTwo-stage LLM, no auto-send
    Inbox triage agent

    This is the highest ROI agent for operators whose inbox is the bottleneck. Knowledge workers spend 28 percent of their week on email. A triage agent that deflects 40 percent of that with a good filing system buys back roughly a day a week per person.

    Build notes. Voice capture is the hard part. Feed the agent 200 of your past sent emails as the style guide. Two stage classifier, cheap model for category, capable model for draft. Never auto send. Always human approval.

    Read the full idea: Inbox triage agent →

  5. 05 / Knowledge base gap analyzer

    Clusters unanswered tickets into topics the knowledge base does not cover. Drafts the missing articles for approval.

    EffortMomentum, 2 to 4 weeks
    Payback60 to 90 days
    StackEmbedding clusters + LLM
    Knowledge base gap analyzer

    This is the back half of agent 01. In-app AI help logs unanswered questions. This agent turns the log into articles. Together they compound: more articles, fewer unanswered questions, less support load, faster feedback loop.

    Build notes. Run it weekly, not in real time. Cluster the unanswered question log by topic embedding. Output is a ranked list of missing articles. A human reviews, the agent drafts, a human publishes.

    Read the full idea: Knowledge base gap analyzer →

  6. 06 / Refund handling agent

    Hears the request, checks policy, issues the refund or escalates, writes the note back to the customer.

    EffortMomentum, 2 to 4 weeks
    Payback30 to 60 days
    Stackn8n + structured policy rules
    Refund handling agent

    For ecommerce or subscription SaaS, refunds are the highest friction support interaction. Half of them are within policy and should be automatic. The other half require human judgment. The refund agent separates them.

    Build notes. The policy has to be in structured form, not a PDF. Set the auto refund threshold conservatively at launch, expand as confidence grows. Log every decision with the exact rule it fired on.

    Read the full idea: Refund handling agent →

  7. 07 / Post-purchase SMS concierge

    After the order ships, the buyer can text questions about sizing, care, or returns. AI answers in brand voice and escalates real issues.

    EffortMomentum, 2 to 4 weeks
    Payback30 to 60 days
    StackTwilio SMS + Claude + Shopify
    Post-purchase SMS concierge

    Ecommerce specific agent that pays back fast because it catches refund and return intent before it becomes a dispute. The stack is Twilio SMS plus Claude plus whatever order system the client runs.

    Build notes. Opt in is legally required, TCPA in the US. Send the opt in inside the order confirmation email. Keep the agent scoped to post purchase questions only, not cross sell.

    Read the full idea: Post-purchase SMS concierge →

05 / Group B

Ops, research, and DevOps agents

These four agents compress the internal work that steals maker time: recruiting screens, scheduling loops, research briefs, and production error triage.

  1. 08 / Recruiter screening agent

    Reads incoming applications, scores them against your hiring rubric, drafts the screening interview.

    EffortDepth, 3 to 5 weeks
    Payback60 to 120 days
    StackATS + Claude + rubric prompt
    Recruiter screening agent

    For any team hiring more than one role a quarter, the screening agent is hours of recruiter time per week. Not all of it. The judgment calls stay human. The first pass through 200 applications is the agent.

    Build notes. The rubric must be explicit and rule based. Vague hiring criteria produce vague agent output. The first deployment usually exposes the fact that the rubric was never written down. That alone is worth the build.

    Read the full idea: Recruiter screening agent →

  2. 09 / Candidate-screen scheduler agent

    Coordinates interviewer calendars, candidate availability, room booking, reschedules, and reminders.

    EffortMomentum, 2 to 3 weeks
    Payback60 to 120 days
    StackCal.com or Calendly + LLM
    Candidate-screen scheduler agent

    The scheduler is the small ops fix that prevents recruiter burnout. When interviewers cancel, the agent finds a backup, reschedules, and notifies everyone. When candidates ghost, it follows up. Without a recruiter touching it.

    Build notes. Calendar integration is the long pole. Google Calendar plus Microsoft 365 covers 80 percent of cases. Build for both from day one or you are rewriting the agent at customer two.

    Read the full idea: Candidate-screen scheduler agent →

  3. 10 / Research brief agent

    Collects sources on a topic, synthesizes findings, exports a formatted brief to Notion or Drive.

    EffortMomentum, 2 to 3 weeks
    Payback30 to 60 days
    StackFirecrawl + Perplexity API
    Research brief agent

    The research agent is the most under rated workflow on this list. Marketing teams use it for competitive briefs. Sales for account intel. Founders for market sizing. The output quality scales with how clearly the operator describes what they want.

    Build notes. Tool use is mandatory. The agent needs web search, page fetch, and a structured output formatter. Anthropic's tool use plus Brave Search plus a Notion or Google Drive export integration is the cleanest stack.

    Read the full idea: Research brief agent →

  4. 11 / Error log triage agent

    Watches the production error stream, scores severity, files the noise, escalates the real signal.

    EffortMomentum, 2 to 3 weeks
    Payback60 to 120 days
    Stackn8n + Sentry or Datadog
    Error log triage agent

    Every team running production has too many errors and not enough triage capacity. The triage agent turns a noisy Sentry feed into a prioritized inbox. Severity scoring against the system architecture is the value, not the alerting.

    Build notes. Run on a five minute cron rather than real time. Cluster by stack trace fingerprint. Score against system criticality and user impact. Escalate only the top three each window.

    Read the full idea: Error log triage agent →

06 / Group C

Revenue, content, and data agents

The final four agents push growth forward: lead capture, first drafts, competitive battle cards, and plain-English answers from the warehouse.

  1. 12 / Conversational chatbot lead capture

    Replaces the static contact form with a chat that qualifies on the first reply.

    EffortMomentum, 1 to 2 weeks
    PaybackUnder 30 days
    StackWidget + Claude + CRM
    Conversational chatbot lead capture

    Static lead forms convert badly. Conversational capture asks the right next question based on what the visitor said, not a fixed list. Conversion rates lift 30 to 60 percent in the deployments we measure.

    Build notes. The qualifier prompts must be tied to the actual sales playbook. Vague qualification produces vague leads. Train against your last 50 closed deals, not against a generic qualification framework.

    Read the full idea: Conversational chatbot lead capture →

  2. 13 / Brief to first draft writer agent

    Takes a brief and your past writing samples, produces a first draft in your voice.

    EffortMomentum, 2 to 4 weeks
    Payback30 to 60 days
    StackLong-context LLM, no fine-tune
    Brief-to-first-draft writer agent

    The writer agent is the highest visible ROI agent for marketing teams. Drafts that previously took two hours come out in fifteen minutes. The human edit pass is faster too because the structure is right from the start.

    Build notes. Voice capture from past sent emails plus published content is the foundation. The draft is never the finished asset. Every draft routes to a human edit. Auto publishing is the failure mode.

    Read the full idea: Brief to first draft writer agent →

  3. 14 / Sales battle card generator

    Pulls competitor reviews, public pricing, and feature pages into a per-deal battle card.

    EffortMomentum, 2 to 3 weeks
    Payback60 to 120 days
    Stackn8n + scrape + G2 ingest
    Sales battle card generator

    For B2B sales teams running competitive deals, the battle card agent is hours of research compressed into one document. Inputs: competitor name, deal context. Output: a structured battle card with weaknesses, common objections, and recommended counter narratives.

    Build notes. Verifiable sources only. Every claim on the battle card cites a public source. Hallucinations on a battle card cost deals. Source verification is the moat.

    Read the full idea: Sales battle card generator →

  4. 15 / Ask your data conversational agent

    Natural language questions about your warehouse data. The agent writes the SQL and returns the answer.

    EffortDeep end, 4 to 10 weeks
    Payback120+ days
    StackWarehouse SQL + LLM
    Ask your data conversational agent

    The data agent democratizes warehouse access. Operators ask questions like the analyst is on call. The agent writes parameterized SQL, runs it against a read replica, returns the result with the source query for verification.

    Build notes. Schema documentation is the foundation. The agent is only as good as the table and column descriptions you feed it. Invest a week documenting the schema in plain English before building the agent or it will hallucinate joins.

    Read the full idea: Ask your data conversational agent →

05 / Ship first

Which two to ship first

If you have one quarter and one engineer, ship in-app AI help and inbox triage. Three reasons.

  1. 01

    They address the two universal bottlenecks

    Support load and communication load. Every company has both. Neither requires a data infrastructure investment. Both ship in under three weeks. Both show measurable ROI inside the first thirty days.

  2. 02

    They compound

    In-app AI help generates the unanswered question log that feeds agent 05, the knowledge base gap analyzer. Inbox triage generates the voice capture data that feeds agent 13, the first draft writer. Starting with the right two agents seeds the data you need to ship the next three.

  3. 03

    They are the canonical shapes

    RAG over docs and inbox triage are the two cleanest patterns to teach a team. Once your team can ship those, every other agent on this list is a known shape with new data attached.

Field F.A.Q.

FAQ

Which of these 15 is fastest to ship?

A: Agent 01 (in-app AI help) and Agent 12 (conversational chatbot lead capture). Both are under 2 weeks for a senior engineer with a CLAUDE.md guide and real documentation to point at.

Do I need n8n to build these?

A: No, but it helps. n8n is the orchestration layer we reach for on agents 03, 06, 07, 10, and 14. Pure code works for agents 01, 02, 04, and 05.

Can a non-technical founder ship any of these?

A: With help, yes. Agents 01, 11, and 12 are available as no code configurations. The rest require real engineering. The AI Automation Playbook has the build plans.

What is the cheapest to run?

A: Agent 11 (error log triage) because it runs on a schedule and uses cheap models. Agents 02 and 15 are the most expensive due to real time voice and large context warehouse queries.

Which of these replaces a human role?

A: None outright. Agents 02 and 08 come closest. The rest shift human time from routine work to judgment calls. That is the right framing. Agents deflect work, they do not replace roles.

What stack do you use for most of these?

A: For orchestration, n8n. For reasoning, Claude (Sonnet for most, Opus for the two or three that need judgment). For retrieval, Pinecone or a small Postgres with pgvector. For voice, Retell AI plus Twilio. For the error log and data agents, direct SQL against the warehouse.

How do you prioritize which two to ship first?

A: Pick the two that touch your biggest bottleneck, then bias toward agents that feed each other. In-app AI help and inbox triage feed agents 05 and 13 with voice capture data, which is why we default to recommending those two first.