How We Built an MCP Server for Our Analytics Platform

A technical deep dive into building an MCP server on AWS Lambda, including OAuth 2.0 from scratch, stateless architecture, hybrid search, and the challenges we hit along the way.

February 17, 2026By Alex

AI assistants are getting better at using tools. The Model Context Protocol (MCP) is one of the ways this is made possible. It's a standard way for AI tools like Claude to connect to external services and pull in real data.

We built an MCP server so AI assistants can directly query your Racoons analytics. Instead of copying numbers from a dashboard into a chat, you can ask Claude to look at your conversion rates, search through past analyses, or count events by UTM campaign, and it just works.

Here's how we built it, what broke, and what we learned.

What we built

The system has a few moving parts:

MCP server at mcp.racoons.ai - an Express app running on AWS Lambda via Lambda Web Adapter, using the StreamableHTTP transport
OAuth 2.0 server at api.racoons.ai - 6 Lambda functions handling client registration, authorization, token exchange, refresh rotation, revocation, and server metadata discovery
Hasura GraphQL as the data layer - the MCP server queries user data through Hasura with per-user row-level permissions
Redis for per-tool, per-user rate limiting

The MCP server itself is stateless. Each request gets authenticated via OAuth bearer token, spins up a fresh StreamableHTTPServerTransport with no session tracking, and tears down after the response.

The server exposes 8 tools:

Tool	What it does
`projects`	Lists all organizations and projects the user has access to
`list_analyses`	Paginated list of analyses for a project, with cursor-based navigation
`fetch_analysis`	Full analysis content including the raw context data used to generate it
`search`	Hybrid semantic + keyword search across analyses
`search_chat_messages`	Search through chat message history
`list_chat_sessions`	Paginated list of chat sessions for a project
`get_chat_session`	Full chat session with all messages
`event_count`	Count events with filters for type, URL, referrer, UTM params, and time range

Every tool returns a URL pointing to the relevant page in the Racoons dashboard, so the AI assistant can link users directly to the source.

Implementing OAuth 2.0 for MCP

The MCP spec requires OAuth 2.0 for authentication. We couldn't use a hosted provider because we needed tight control over scopes, token lifetimes, and how tokens interact with our existing cookie-based auth system. Also I just like doing stuff myself.

So we built it from scratch. Six Lambdas, a shared @repo/racoons-oauth package, and a lot of RFC reading.

Discovery: how clients find our OAuth server

MCP clients need to discover our OAuth endpoints automatically. This works through two well-known metadata endpoints.

The MCP server itself serves /.well-known/oauth-protected-resource, which tells clients where the authorization server lives:

{
  "resource": "https://mcp.racoons.ai",
  "authorization_servers": ["https://api.racoons.ai"],
  "scopes_supported": ["mcp:read"]
}

The client then fetches /.well-known/oauth-authorization-server from api.racoons.ai, which returns the full server metadata — every endpoint it needs to complete the flow:

{
  "issuer": "https://api.racoons.ai",
  "authorization_endpoint": "https://api.racoons.ai/oauth/authorize",
  "token_endpoint": "https://api.racoons.ai/oauth/token",
  "registration_endpoint": "https://api.racoons.ai/oauth/register",
  "revocation_endpoint": "https://api.racoons.ai/oauth/revoke",
  "grant_types_supported": ["authorization_code", "refresh_token"],
  "code_challenge_methods_supported": ["S256"],
  "scopes_supported": ["mcp:read", "offline_access"]
}

This is a dedicated Lambda that returns a static JSON response. It might seem like overkill for a static document, but it means we can update the metadata (add scopes, change endpoints) without redeploying the MCP server or any client-facing code. Each piece owns its own truth.

Dynamic client registration

Before an MCP client can start the OAuth flow, it needs a client_id. Rather than manually creating credentials for every client, we support RFC 7591 dynamic client registration at /oauth/register.

A client sends a POST with its redirect URIs and an optional name:

{
  "redirect_uris": ["http://localhost:6274/oauth/callback"],
  "client_name": "Claude Desktop"
}

The server inserts a new oauth_client row via Hasura and returns the registration response. All clients are public (no client_secret) since MCP clients are typically desktop or CLI apps where a secret can't be kept confidential:

{
  "client_id": "generated-uuid",
  "client_id_issued_at": 1740000000,
  "redirect_uris": ["http://localhost:6274/oauth/callback"],
  "grant_types": ["authorization_code", "refresh_token"],
  "token_endpoint_auth_method": "none",
  "scope": "mcp:read offline_access"
}

If no scope is requested, clients default to mcp:read offline_access — read access plus the ability to get refresh tokens.

The authorization code flow

Once a client has its client_id, the full flow looks like this:

1. Authorization request. The client redirects the user to /oauth/authorize with the standard params: client_id, redirect_uri, state, code_challenge, and scope. A redirect Lambda at the API Gateway forwards this to our Next.js app at racoons.ai/authorize, where the user sees a consent screen.

2. Auth code issuance. When the user approves, the auth Lambda validates everything:

Looks up the oauth_client in Hasura and verifies the redirect_uri is an exact match against the registered URIs
Enforces PKCE S256 only — no plain method allowed. Validates the challenge format with /^[A-Za-z0-9\-_]{43,128}$/
Computes the effective scope by intersecting what the client requested, what the client is allowed, and what the server supports
Checks the user's existing session via a JWT cookie (our existing auth system)
Mints a random auth code, hashes it via KMS, stores the hash with a 10-minute TTL, and redirects back with the code

3. Token exchange. The client exchanges the code at /oauth/token. The token Lambda hashes the presented code with KMS, looks it up in Hasura, and verifies the PKCE code verifier by comparing sha256(verifier) against the stored challenge. If everything checks out, it atomically consumes the auth code (preventing reuse) and mints tokens.

4. Token issuance. Access tokens live for 15 minutes. If the client's grant types include refresh_token and the effective scope includes offline_access, a refresh token is also issued with a 60-day expiry. Both tokens are random 32-byte base64url strings, stored in the database as KMS HMAC-SHA-256 hashes.

Token security

We never store raw tokens in the database. Every access token, refresh token, and auth code gets HMAC-SHA-256 hashed through AWS KMS before storage:

export async function kmsGenerateMac(
  raw: string,
  kmsKeyArn: string,
): Promise<string> {
  const out = await kms.send(
    new GenerateMacCommand({
      KeyId: kmsKeyArn,
      MacAlgorithm: "HMAC_SHA_256",
      Message: new TextEncoder().encode(raw),
    }),
  );
  return Buffer.from(out.Mac).toString("base64");
}

When a bearer token comes in, we hash it with KMS and look up the hash. This means even a full database leak doesn't expose usable tokens. The HMAC key never leaves KMS, we can't even read it ourselves.

Refresh token rotation and revocation

Each refresh token belongs to a "family" identified by a UUID. When you rotate a refresh token, we mint a new one in the same family, mark the old one as used, and issue a fresh access token — all in a single Hasura mutation (oauth_rotate_refresh_strict).

The critical part is replay detection. If someone presents an already-used refresh token, it means either the token was stolen and the attacker already used it, or the legitimate user is trying to use a stale token. Either way, the entire family gets revoked.

The revocation endpoint (/oauth/revoke) supports both token types. It uses token_type_hint to check the right table first for efficiency, falling back to checking both if the hint is wrong or missing. For refresh tokens, it revokes the entire family, not just the presented token. Per the OAuth spec, it always returns 200 even if the token doesn't exist, to avoid leaking information about which tokens are valid.

Scope negotiation

Scope handling has more nuance than you'd expect. When a client requests scopes during authorization, we don't just accept them, we intersect three sets:

Server-supported scopes - currently mcp:read and offline_access
Client-allowed scopes - what was registered for this specific client
Requested scopes - what the client is asking for in this specific authorization

The effective scope is the intersection of all three. Additionally, even if offline_access is in the effective scope, we strip it if the client's grant_types don't include refresh_token. No point issuing a scope that implies refresh tokens if the client can't use them.

Getting scope validation right on the tool side took us four iterations across two months. The initial hasScope() checked a single string. We eventually landed on hasScopes() (plural), which parses the space-delimited scope string into a Set and checks multiple required scopes against it. The main gotcha was that the scope string can be null or undefined in various edge cases, you need to handle that at every layer.

Running the MCP server on Lambda

We chose to run the MCP server as a Docker image Lambda using the AWS Lambda Web Adapter. This lets us run a standard Express app that the MCP SDK expects, while still getting Lambda's scaling and pricing benefits.

The Dockerfile

The Dockerfile uses a multi-stage build. In a monorepo with 58 apps, you need to be deliberate about what goes into the image:

# ---- build ----
FROM node:22-slim AS build
WORKDIR /work
RUN corepack enable && corepack prepare yarn@4.3.1 --activate

# Copy monorepo scaffold + only needed workspace package.jsons
COPY package.json yarn.lock .yarnrc.yml turbo.json ./
COPY .yarn/ .yarn/
COPY apps/mcp-server-lambda/package.json apps/mcp-server-lambda/package.json
COPY packages/racoons-oauth/package.json packages/racoons-oauth/package.json
COPY packages/redis-rate-limit/package.json packages/redis-rate-limit/package.json
COPY packages/dashboard-urls/package.json packages/dashboard-urls/package.json

# Install only this app's deps
RUN yarn workspaces focus mcp-server-lambda --all

# Copy sources, build deps topologically, then build the app
COPY apps/mcp-server-lambda/ apps/mcp-server-lambda/
COPY packages/racoons-oauth/ packages/racoons-oauth/
COPY packages/redis-rate-limit/ packages/redis-rate-limit/
COPY packages/dashboard-urls/ packages/dashboard-urls/
RUN yarn workspaces foreach -R --topological-dev -v run build || true
RUN yarn workspace mcp-server-lambda build

# ---- runtime ----
FROM node:22-slim
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 \
  /lambda-adapter /opt/extensions/lambda-adapter

WORKDIR /var/task
COPY --from=build /work/node_modules ./node_modules
COPY --from=build /work/apps/mcp-server-lambda/dist ./dist
COPY --from=build /work/apps/mcp-server-lambda/package.json ./
# Copy only the dist of each internal package we need
COPY --from=build /work/packages/racoons-oauth/dist ./packages/racoons-oauth/dist
COPY --from=build /work/packages/redis-rate-limit/dist ./packages/redis-rate-limit/dist
COPY --from=build /work/packages/dashboard-urls/dist ./packages/dashboard-urls/dist

ENV PORT=8080 NODE_ENV=production
CMD ["node","dist/index.js"]

yarn workspaces focus installs only the dependencies needed by mcp-server-lambda and its internal packages — not all 58 apps.

Topological build order. Internal packages like @repo/racoons-oauth need to be built before the app that imports them. yarn workspaces foreach -R --topological-dev handles this.

Lambda Web Adapter as an extension. The COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 line bridges Lambda's invoke model to our Express app. It listens for Lambda invocations and proxies them to localhost:8080.

What to watch out for

This was not smooth. Our September 3rd commit says it all: "Try to fix docker runtime connection."

Health checks. The Lambda Web Adapter waits for a health check to pass before forwarding traffic. Our /healthz endpoint exists specifically for this. Without it, the adapter thinks the app isn't ready and the Lambda times out.

Buffered invoke mode. Lambda Web Adapter supports both streaming and buffered modes. We use buffered because our MCP responses are complete JSON payloads, we need the full response body before sending it back.

Internal package resolution. The runtime stage only has dist/ folders from internal packages. Getting Node's module resolution to find @repo/racoons-oauth inside the container required copying each package's package.json alongside its dist/. Without the package.json, Node can't resolve the package entry point and the import fails at runtime, something you won't catch until the Lambda actually runs.

How the tools work

Every tool follows the same pattern: authenticate, check scopes, rate limit, query Hasura, transform, return.

Passing auth context to tools

The MCP SDK's tool handlers don't receive request context. We use Node.js AsyncLocalStorage to thread the authenticated user's info through to each tool handler without modifying the SDK:

export const authContext = new AsyncLocalStorage<SuccessClientAuthResult>();

// In the main Express handler — wrap the entire MCP request
await authContext.run(clientInfo, async () => {
  await mcp.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

// In any tool handler — read the stored context
const ctx = authContext.getStore();
// ctx.userId, ctx.clientId, ctx.scope are all available

The main request handler authenticates the bearer token against the database (via the isClientAuthed function in @repo/racoons-oauth), then wraps the MCP transport call in authContext.run(clientInfo, ...). Every tool reads the context with authContext.getStore() and can access the user ID, client ID, and granted scopes.

Per-tool scope checks and rate limiting

Each tool starts by checking its required scopes and enforcing rate limits:

// 1. Get auth context
const ctx = authContext.getStore();
if (!ctx) return unauthorized();

// 2. Check required scopes
const scopeResult = checkScope(ctx.scope, ["mcp:read"]);
if (scopeResult) return scopeResult;

// 3. Rate limit per user, per tool
rateLimitClient.addLimit({
  key: `mcp-server-lambda-search:${ctx.userId}`,
  limit: 20,
  expirationInSeconds: 60,
});
const rateLimitCheck = await rateLimitClient.checkAndIncrement();
if (!rateLimitCheck) return rateLimited();

// 4. Create a user-scoped Hasura client and query
const userClient = createUserClient(ctx.userId);

The rate limit key includes both the tool name and user ID, so each tool has its own limit (12/min for projects, 20/min for search, 10/min for event_count). The createUserClient call creates a Hasura GraphQL client that runs with that specific user's row-level permissions, so even if a tool queries across tables, Hasura only returns data the user is authorized to see.

Hybrid search

The search tool is the most interesting one technically. It combines OpenAI embeddings with PostgreSQL keyword search to find relevant analyses.

When a search query comes in:

We generate a vector embedding using text-embedding-3-small via the Vercel AI SDK
We pass both the raw search term and the embedding vector to a PostgreSQL function exposed through Hasura
That function combines tsvector keyword matching with pgvector cosine similarity
Results come back ranked by a blend of both signals, with highlighted search snippets

query SearchAnalyses(
  $searchTerm: String!
  $queryEmbedding: vector!
  $projectId: uuid!
  $limitCount: Int = 20
) {
  hybrid_search_analyses(
    args: {
      search_term: $searchTerm
      query_embedding: $queryEmbedding
      project_filter: $projectId
      limit_count: $limitCount
    }
  ) {
    id
    title
    timestamp
    analysis_type_name
    search_snippet(args: { search_term: $searchTerm })
    page_path
  }
}

This means a query like "conversion rate optimization" will find analyses that mention those exact words and analyses that are semantically related even if they use different terminology.

Try it out

We built this so you can talk to your analytics data through AI assistants you already use. If you want to see it in action, check out the demo.

Have questions about our MCP implementation? Reach out at racoons.ai.