AI in Web Applications

Using AI to write code is one thing. Shipping AI inside your product is another — keys, latency, streaming, auth, and session memory all land on you.

Last reviewed: June 2026

Patterns below assume Next.js or similar full-stack JS; principles apply to any backend.

Client vs Server

Run AI on	Good for	Never do
Server	Production chat, RAG, tool calling, billing	—
Client (browser)	Demos, local models via WebGPU	Expose API secret keys
Edge	Low-latency streaming close to users	Heavy RAG without caching

Rule: API keys live on the server only. The browser talks to your API route; your route talks to OpenAI/Anthropic.

Architecture Pattern

Browser → POST /api/chat → Your server → LLM provider
                ↑
         auth, rate limit, logging

Auth: Require a Session Before the LLM Call

Never expose an unauthenticated route to an LLM provider. Every request should verify identity before spending tokens.

// app/api/chat/route.ts
import { auth } from "@/lib/auth";
import { anthropic } from "@ai-sdk/anthropic";
import { streamText } from "ai";

export async function POST(req: Request) {
  // Auth check first — before parsing body or calling LLM
  const session = await auth();
  if (!session?.user) {
    return new Response("Unauthorized", { status: 401 });
  }

  const { messages } = await req.json();

  const result = streamText({
    model: anthropic("claude-sonnet-4-20250514"),
    messages,
    maxTokens: 1024,
  });

  return result.toDataStreamResponse();
}

For paid features, also enforce your billing/quota check before the LLM call — not after.

Session Memory and Multi-Turn Chat

A stateless API route receives the full conversation history on each request. For short sessions this is fine; for long-running assistants you need to manage history size.

Client-managed history (simple)

The useChat hook from @ai-sdk/react holds history in React state. This is the default pattern — history lives in the browser, resets on page reload.

"use client";
import { useChat } from "@ai-sdk/react";

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({ api: "/api/chat" });

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((m) => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <input value={input} onChange={handleInputChange} disabled={isLoading} />
      <button type="submit" disabled={isLoading}>Send</button>
    </form>
  );
}

Server-persisted history (production)

For history that survives page reloads and works across devices, persist messages server-side:

// lib/chat-store.ts — save and load conversation history
import { db } from "@/lib/db";
import type { CoreMessage } from "ai";

export async function saveMessages(sessionId: string, messages: CoreMessage[]) {
  await db.chatSessions.upsert({
    where: { id: sessionId },
    update: { messages: JSON.stringify(messages), updatedAt: new Date() },
    create: { id: sessionId, userId: getCurrentUserId(), messages: JSON.stringify(messages) },
  });
}

export async function loadMessages(sessionId: string): Promise<CoreMessage[]> {
  const session = await db.chatSessions.findUnique({ where: { id: sessionId } });
  return session ? JSON.parse(session.messages) : [];
}

// app/api/chat/route.ts — load, append, save
export async function POST(req: Request) {
  const session = await auth();
  if (!session?.user) return new Response("Unauthorized", { status: 401 });

  const { sessionId, newMessage } = await req.json();

  // Load history, append new user message
  const history = await loadMessages(sessionId);
  const messages = [...history, newMessage];

  // Truncate to last 20 turns to control token cost
  const trimmed = messages.slice(-20);

  const result = streamText({
    model: anthropic("claude-sonnet-4-20250514"),
    messages: trimmed,
    maxTokens: 1024,
    onFinish: async ({ text }) => {
      // Persist the updated history including the assistant reply
      await saveMessages(sessionId, [
        ...trimmed,
        { role: "assistant", content: text },
      ]);
    },
  });

  return result.toDataStreamResponse();
}

See Cost, Latency, and Tokens for history truncation patterns.

Streaming UX

Users abandon chat UIs that wait 10 seconds for a full response.

Stream tokens with Vercel AI SDK useChat()
Show typing indicator on first token
Allow cancel (AbortController)
Handle errors gracefully — partial responses happen

const { stop, isLoading } = useChat({ api: "/api/chat" });

// Cancel button
{isLoading && (
  <button onClick={() => stop()} type="button">Cancel</button>
)}

Rate Limiting and Abuse Prevention

Every AI endpoint needs per-user limits:

// lib/rate-limit.ts — in-memory (use Redis / Upstash in production)
const hits = new Map<string, number[]>();

export function checkRateLimit(
  key: string,
  max = 20,
  windowMs = 60_000
): boolean {
  const now = Date.now();
  const recent = (hits.get(key) ?? []).filter((t) => now - t < windowMs);
  if (recent.length >= max) return false;
  recent.push(now);
  hits.set(key, recent);
  return true;
}

// In your route handler
const userId = session.user.id;
if (!checkRateLimit(userId)) {
  return new Response("Too many requests", { status: 429 });
}

For production, use Upstash Rate Limit or a similar Redis-backed solution that works across serverless instances.

Client-Side ML (Browser)

For on-device inference (TensorFlow.js, transformers.js):

Models are large — lazy-load
Privacy-friendly (data stays local)
Quality tradeoff vs cloud LLMs

Use for classification/embeddings in the browser, not as a replacement for server chat in most products.

LLM API Route Handler Cheat Sheet — minimal copy-paste route + client
LLM APIs and Tool Calling
Anthropic API for Web Developers
OpenAI API for Web Developers
Security and Prompt Injection
Cost, Latency, and Tokens
LLM Observability and Evals
AI-Assisted Development with React/Next.js

Stop vibe-debugging.

AI in Web Apps

Client vs Server

Architecture Pattern

Auth: Require a Session Before the LLM Call

Session Memory and Multi-Turn Chat

Client-managed history (simple)

Server-persisted history (production)

Streaming UX

Rate Limiting and Abuse Prevention

Client-Side ML (Browser)

Stop vibe-debugging.

On this page

Client vs Server

Architecture Pattern

Auth: Require a Session Before the LLM Call

Session Memory and Multi-Turn Chat

Streaming UX

Rate Limiting and Abuse Prevention

Client-Side ML (Browser)

Related Guides