Skip to main content

Architecture v3

Status

This document is a merged architecture that combines the conceptual model from Architecture v2 with the technology stack and platform alignment from Stella Catalog Spec v1.

  • It is intended for a greenfield implementation.
  • It assumes no production code exists yet.
  • It preserves Constellation shared platform alignment.
  • It adopts the three-layer architecture (Domain Core, Tool Layer, Agent Plane) from v2.
  • It keeps the TypeScript/NestJS stack from v1 for full-stack coherence and deployment simplicity.

1. Goals

The system should be:

  1. AI-first at runtime
    • Agents are first-class users of the platform.
    • The platform supports planning, tool use, memory, evaluation, and human approval flows.
    • MCP tools are business capabilities, not CRUD wrappers.
  2. AI-first in development
    • AI coding agents should be able to add modules and features with low context overhead.
    • Module boundaries, contracts, and tests must be explicit and machine-readable.
    • Every module follows the same template with predictable file names and clear responsibilities.
  3. Deterministic where it matters
    • Product data, pricing, permissions, tenant isolation, and audit remain domain-controlled.
    • Agents never directly own business truth.
    • The domain core is testable without an LLM.
  4. Modular without premature distribution
    • The system starts as a modular monolith.
    • Modules can be extracted later if scale or team topology requires it.
  5. Safe for enterprise and multi-tenant use
    • Strong tenant isolation, auditability, idempotency, policy checks, and approval gates are built in from day one.
  6. Platform-aligned
    • Shares technology foundations with Constellation via @constellation-platform/* packages.
    • One language (TypeScript) across API, admin, contracts, SDKs, and agent plane.

2. Core Position

This architecture rejects both extremes:

  • Not a classic centralized CRUD/API platform with thin MCP wrappers bolted on.
  • Not a swarm of autonomous agents directly mutating shared state.

Instead, the platform is split into three layers:

  1. Domain Core — Deterministic business logic and source-of-truth data.
  2. Tool Layer — High-level business capabilities exposed to agents and humans.
  3. Agent Plane — Planning, orchestration, delegation, memory, approvals, and workflow execution.

The domain core is authoritative. The agent plane is adaptive. The tool layer is the contract between them.

Why three layers matter

Without an explicit agent plane, agent behavior gets scattered across API controllers, service methods, and ad-hoc scripts. Without a tool layer distinct from CRUD endpoints, agents must understand implementation details instead of working at the intent level. The three-layer split ensures each concern has a clear owner.

3. Technology Stack

Primary stack

ComponentTechnology
Backend runtimeNode.js 20+, TypeScript (strict)
API frameworkNestJS
Admin UINext.js (App Router)
Validation & contractsZod v4 + JSON Schema + OpenAPI
ORMPrisma 6
DatabasePostgreSQL 16+
Extensionspgvector, ltree, pg_trgm, pgcrypto, uuid-ossp
Job queue (default)Postgres outbox + LISTEN/NOTIFY (via @constellation-platform/jobs)
Job queue (high-throughput)BullMQ + Redis (opt-in per module)
Object storageS3-compatible (MinIO for dev; Supabase Storage on cloud)
AuthOAuth2/OIDC delegation; JWT with sub, tenant_id, roles
ObservabilityOpenTelemetry (Jaeger dev, NewRelic/Datadog prod)
TestingVitest + fast-check (property-based) + integration tests
Agent toolingMCP SDK (TypeScript), custom orchestration layer
PackagingMonorepo (Turborepo)

Why TypeScript-first

TypeScript is the better default for this product because:

  • Constellation shares a common platform in TypeScript; switching languages breaks alignment.
  • One language across API, admin UI, contracts, SDKs, and agent plane reduces context-switching for both humans and AI coding agents.
  • The MCP SDK is TypeScript-native.
  • Prisma, Zod, and the NestJS ecosystem are mature and well-understood by AI coding tools.
  • Node.js provides a real long-running process model needed by agents, workers, and durable workflows.

Python remains available for:

  • Embedding model integrations where Python libraries have no TypeScript equivalent.
  • Data science and evaluation scripts.
  • Specialized agent evaluations using Python eval frameworks.

These are called as external processes or microservices, not as the primary runtime.

Constellation platform alignment

Shared packages from @constellation-platform/* (published from the external platform-packages repo via GitHub Packages private registry; consumed as versioned npm dependencies in Stella's package.json):

PackagePurpose
auth-coreJWT validation, tenant context extraction, provider abstraction
dbPrisma client setup, RLS helpers, migration utilities
eventsDomain event contracts, outbox pattern, LISTEN/NOTIFY
jobsPostgres outbox + LISTEN/NOTIFY by default; BullMQ adapter for high-throughput
errorsTyped error classes, API error envelope
testingTest fixtures, property-based test helpers, integration test utilities

4. Repository Topology

apps/
├── api/ # NestJS backend: REST API + tool API + auth
├── agents/ # Agent runtime: orchestrator, specialists, memory, approvals
└── admin/ # Next.js admin UI

packages/
├── contracts/ # Zod schemas, OpenAPI specs, generated SDKs
├── agent-sdk/ # Thin primitives: tool registry types, memory interfaces, eval helpers
└── module-template/ # Scaffolding for new modules

# External: @constellation-platform/* packages (separate `platform-packages` repo)
# Published to GitHub Packages private registry, consumed as versioned npm dependencies.
# See Stella_Constellation_AI_Shared_Architecture_Plan_v1.md for package contracts.
# Packages: auth-core, auth-nest, db, events, jobs, errors, testing

modules/
├── products/
├── taxonomy/
├── search/
├── ingest/
├── pricing/
├── supplier-offers/
├── canonicals/
├── shares/
└── reference-catalogs/

Key differences from v1

  • apps/agents/ is a first-class application, not hidden inside the API.
  • packages/agent-sdk/ provides thin primitives (tool registry types, memory interfaces, eval helpers) — not an orchestration framework. Orchestration logic lives in apps/agents/.
  • packages/module-template/ provides scaffolding for AI coding agents.
  • Each module includes tools.ts, workflows.ts, policies.ts, and evals/.

Key differences from v2

  • TypeScript throughout, not Python.
  • Prisma instead of raw repositories.
  • Shared platform packages instead of standalone runtime libraries.
  • Supabase database and auth path preserved; Vercel used for UI hosting.

5. Architectural Layers

5.1 Domain Core

The domain core owns:

  • Product and catalog state
  • Taxonomy and classifications
  • Supplier offers and canonical products
  • Pricing and quote rules
  • Ingestion and conflict resolution
  • Tenant configuration
  • Permissions and policy checks
  • Audit records

The domain core must be:

  • Deterministic
  • Idempotent
  • Transaction-safe
  • Tenant-scoped (RLS enforced)
  • Independently testable without an LLM

Implementation lives in each module's service.ts, repository.ts, models.ts, and events.ts.

5.2 Tool Layer

Tools are not CRUD wrappers. Tools are business actions.

Good tool examples:

  • search_catalog — semantic search with natural language
  • compare_products — structured attribute comparison across products
  • build_quote — assemble a priced quote from product selections
  • find_substitutes — locate alternative products meeting criteria
  • validate_channel_readiness — check if products meet syndication requirements
  • enrich_from_reference_catalog — pull and apply reference data
  • explain_price_decision — trace how a price was calculated
  • review_match_suggestion — present a supplier-offer match for human review

Bad tool examples (avoid):

  • create_product — too low-level, no business context
  • update_rule — exposes implementation details
  • list_records — generic, no agent value

CRUD endpoints still exist for the human-facing REST API, but agent-facing tools must be higher-level, bounded, and policy-aware.

Each tool must define:

  • Clear purpose and description
  • Zod-validated inputs and outputs
  • Permission requirements
  • Tenant scoping rules
  • Failure modes and error contracts
  • Idempotency behavior
  • Usage examples
  • Eval cases

Implementation lives in each module's tools.ts.

Tool taxonomy

  1. Read tools — search, compare, explain, inspect. Safe to call freely.
  2. Write tools — bounded state changes with approval and policy checks. Rare and strongly governed.
  3. Composite tools — high-level actions coordinating multiple domain services. May trigger workflows.

5.3 Agent Plane

The agent plane owns:

  • Task planning and decomposition
  • Tool sequencing and selection
  • Multi-step workflow orchestration
  • Bounded delegation to specialist agents
  • Short-term working memory (session-scoped)
  • Structured working memory (facts gathered during a workflow)
  • Approval request creation and resolution
  • Retries and recovery
  • Evaluation and trace capture

The agent plane does not directly write business data. All mutations go through tools, which enforce policies and tenant scoping.

Implementation lives in apps/agents/ (orchestration, workflows, specialist agents) with thin shared types from packages/agent-sdk/ (tool registry interfaces, memory store contracts, eval runner helpers).

6. Runtime Model

6.1 API surfaces

The system exposes three distinct interfaces:

  1. Human API — REST/JSON for admin UI and external integrations. Served by apps/api/.
  2. Tool API — MCP protocol and internal tool registry exposing business capabilities. Tools registered from each module's tools.ts.
  3. Workflow API — Start, resume, cancel, and query long-running agent tasks. Served by apps/agents/.

6.2 Execution modes

  1. Request/response — Synchronous validation, search, comparison, quoting. Used by both human API and tools.
  2. Durable workflow — Ingest pipelines, enrichment, large quote builds, catalog validation batches. Managed by the agent plane with explicit typed state machines whose state is persisted to PostgreSQL.
  3. Background jobs — Embeddings, reindexing, document parsing, sync tasks. Managed by the job queue.

6.3 Durable execution model (canonical)

The system uses one default execution backbone. This resolves the ambiguity between BullMQ, Inngest, Trigger.dev, and custom FSMs.

ConcernDefaultWhen to upgrade
Job dispatchPostgres outbox + LISTEN/NOTIFY via @constellation-platform/jobsNever — this is the canonical primitive
Workflow stateDB-serialized typed FSM — workflow state lives in a workflow_runs table with Zod-validated JSON stateNever — all workflows use this
High-throughput queuesBullMQ + Redis (opt-in)Only when a module proves Postgres throughput is insufficient (e.g., bulk embeddings at 10k+ items)
Cron / scheduledpg_cron or application-level schedulerDefault; no external dependency

Why not Inngest/Trigger.dev? They couple to Vercel's execution model and add an external dependency. Starting with Postgres-native primitives keeps the system self-contained and portable. If operational evidence later shows Postgres LISTEN/NOTIFY cannot keep up, BullMQ is already available as an adapter — no architectural change needed.

Why not a custom FSM library? The FSM is not a library. It is a pattern: each workflow defines a Zod-validated state type, a transition function, and a workflow_runs table row. The pattern is codified in packages/agent-sdk/ as types and helpers, not as a runtime engine.

6.4 Multi-agent strategy

Default to single orchestrator agent + specialist helpers, not full peer swarms.

Recommended initial specialist agents:

  • Search agent — handles catalog search, comparison, and similarity queries
  • Pricing/quote agent — builds quotes, explains pricing, applies rules
  • Enrichment agent — matches reference catalogs, applies enrichment data
  • Validation agent — checks channel readiness, data quality, schema compliance

Add more specialists only when evals show a clear gain. Avoid premature agent proliferation.

7. Data Architecture

7.1 Primary database

PostgreSQL remains the system of record.

Use it for:

  • Transactional data (Prisma-managed)
  • JSONB flexible attributes
  • ltree taxonomy paths
  • pgvector embeddings
  • Outbox/events (LISTEN/NOTIFY + polling)
  • Workflow state metadata
  • Audit logs

7.2 Tenancy

Single tenant key: tenant_id.

Rules:

  • Every tenant-scoped table has tenant_id
  • PostgreSQL RLS is mandatory on all tenant-scoped tables
  • Services still scope by tenant_id as defense-in-depth
  • Tools inherit tenant context from auth/session
  • Cross-tenant reads and writes fail by design

Avoid dual-scope organization_id + tenant_id unless a concrete business case demands both.

7.3 Eventing

Use domain events and an outbox table.

Initial design:

  • Postgres outbox for durability
  • Typed event schemas (Zod)
  • Idempotent consumers
  • Event versioning
  • LISTEN/NOTIFY for low-latency in-process delivery

Do not introduce Kafka or distributed buses at the start.

8. Module Standard

Every module must follow the same structure.

modules/<module-name>/
├── README.md # Module purpose, domain concepts, API surface
├── AGENTS.md # Instructions for AI coding agents working on this module
├── schemas.ts # Zod input/output models
├── models.ts # Prisma-facing entities and typed domain models
├── service.ts # Deterministic business logic
├── tools.ts # Agent-facing tool definitions
├── workflows.ts # Explicit state machine workflows
├── policies.ts # Permission and approval logic
├── repository.ts # DB access via Prisma
├── events.ts # Emitted/consumed domain events
├── tests/ # Unit and integration tests
└── evals/ # Agent/tool evaluations and regression suites

File responsibilities

FileOwnsDoes not own
schemas.tsZod input/output schemas for API and toolsPrisma types, DB concerns
models.tsPrisma model types, domain value objectsBusiness logic
service.tsDeterministic business logic, validationDB access, HTTP concerns
tools.tsAgent-facing tool definitions, MCP registrationBusiness logic (delegates to service)
workflows.tsMulti-step orchestrated procedures, state machinesDirect DB access
policies.tsPermission checks, approval gate logicAuthentication (handled by platform)
repository.tsPrisma queries, tenant-scoped data accessBusiness logic
events.tsDomain event definitions, emission, consumptionSide effects outside the module
tests/Unit tests, integration tests, property-based testsAgent evals
evals/Tool selection tests, workflow completion tests, regression suitesDeterministic unit tests

Module rules

  • No module imports another module's repository directly.
  • Cross-module access goes through services or tool contracts.
  • No hidden magic registration — all wiring is explicit.
  • No large base classes or deep inheritance.
  • No decorator-heavy abstractions that obscure control flow.
  • File count per module should remain small enough for AI tools to reason over (aim for < 15 files).
  • Every module has a short README.md and AGENTS.md.

9. AI-Friendly Development Rules

The system must be intentionally easy for AI coding agents to extend.

Required practices

  • Explicit Zod schemas everywhere — no implicit types or any.
  • One clear module template — every module looks the same.
  • Predictable file names — an AI agent can find tools.ts in any module.
  • Minimal framework magic — avoid custom decorators, interceptor chains, and DI tricks.
  • Generated SDKs from one contract source (OpenAPI from Zod schemas).
  • Usage examples for every tool.
  • Evals for every non-trivial workflow.
  • Local fixtures for every module.
  • AGENTS.md in every module describing domain concepts, boundaries, and gotchas.

NestJS guardrails

NestJS is the API framework for apps/api/ because of Constellation platform alignment and broad AI training-data coverage. However, NestJS's decorator-heavy DI system is a known friction point for AI coding agents. The following rules constrain NestJS usage to the predictable subset:

Banned patterns:

  • ❌ Custom decorators — all cross-cutting concerns use standard NestJS decorators or middleware.
  • ❌ Request-scoped providers — all services are singletons. Tenant context uses AsyncLocalStorage (Appendix A.5), not request-scoped injection.
  • forwardRef() — indicates circular dependencies; refactor instead.
  • ❌ Custom interceptor chains — use at most one global interceptor (for correlation ID / logging).
  • ❌ Dynamic modules with runtime configuration — keep module registration static and declarative.
  • ❌ Deep NestJS DI for platform primitives — AppContext is a plain object, not a NestJS provider tree (Appendix A.4).

Required patterns:

  • ✅ Each module registers exactly one NestJS module with one controller and one service provider.
  • ✅ Controllers are thin: validate input (Zod pipe) → delegate to AppContext-wired service → return result.
  • ✅ All business logic lives in modules/*/service.ts, never in NestJS controllers or providers.
  • ✅ Module services receive AppContext through the composition root, not through @Inject() tokens.
  • ✅ The NestJS module.ts file is boilerplate — AI agents copy it from the module template without modification.

Escape hatch: If AI coding agents consistently fail on NestJS DI wiring during Phase 0 implementation, the composition root architecture allows apps/api/ to be replaced with Fastify + tRPC without changing any module code. Module contracts are framework-independent by design.

Eval scaffolding

AI coding agents struggle to write agentic evals from scratch because the eval harness (mock LLM responses, tool call assertions, trace validation) requires substantial boilerplate.

The CLI must provide:

stella eval:scaffold <tool-name>

This command generates:

  • Mock LLM response fixtures for the target tool
  • Expected tool call assertion templates
  • Eval harness boilerplate with correct imports and AppContext test setup
  • Example passing and failing test cases
  • Annotation mapping to the correctness property the eval validates

This scaffolding is a Phase 0 requirement. No AI coding agent should be asked to build downstream module evals until the scaffold command produces a working baseline.

Required CI gates

  • Type checking (tsc strict)
  • Linting (ESLint)
  • Unit tests (Vitest)
  • Integration tests (against real Postgres)
  • RLS tests (verify tenant isolation)
  • Contract compatibility tests (Zod schema backward compat)
  • Tool schema validation (all tools have valid Zod I/O)
  • Eval regression suite (no tool selection regressions)
  • Module boundary enforcement (no cross-module repository imports)

File and context rules

  • Keep files focused and small (< 300 lines preferred, < 500 max).
  • Prefer pure functions in services where possible.
  • Avoid deep inheritance — prefer composition.
  • Keep side effects explicit and at module boundaries.
  • Each module's full source should fit in an AI agent's context window.

10. Tool Design Standard

Tools are a first-class product surface — the primary way agents interact with the system.

Tool definition contract

Every tool must specify:

interface ToolDefinition {
name: string; // e.g. "search_catalog"
description: string; // clear purpose for agent tool selection
inputSchema: ZodSchema; // validated inputs
outputSchema: ZodSchema; // validated outputs
permissions: string[]; // required roles/permissions
tenantScoping: 'required' | 'system' | 'none';
idempotent: boolean;
failureModes: FailureMode[]; // documented error cases
examples: ToolExample[]; // input/output pairs for agent context
evalCases: EvalCase[]; // regression test cases
}

Tool taxonomy

  1. Read tools — search, compare, explain, inspect. No side effects. Safe for agents to call freely.
  2. Write tools — bounded state changes. Must check policies. May require approval. Should be rare.
  3. Composite tools — high-level actions coordinating multiple services. May trigger durable workflows.

Write tools should be rare and strongly governed. When in doubt, make it a read tool that returns a proposed action, and let approval flow handle the mutation.

11. Workflow Design Standard

Workflows are explicit state machines, not hidden prompt behavior.

Each workflow defines:

  • Start conditions and triggers
  • Input schema (Zod)
  • Named states and transitions
  • Tool calls at each state
  • Retry policy per step
  • Timeout policy per step and overall
  • Approval checkpoints (which steps need human sign-off)
  • Completion criteria
  • Failure and compensation logic
  • Evaluation criteria

Example workflow classes

  • Quote generation workflow
  • Catalog enrichment workflow
  • Supplier match review workflow
  • Syndication readiness workflow
  • Legacy PIM ingest workflow
  • Bulk import workflow

Implementation approach

Start with a simple typed finite state machine. Do not introduce Temporal, XState, or a heavy workflow engine unless operational evidence justifies it. The state machine should be:

  • Serializable to/from the database (for durability)
  • Inspectable (current state, history of transitions)
  • Resumable after process restart
  • Traceable (every transition logged)

FSM and agent execution model

The FSM owns the skeleton — states, valid transitions, timeouts, and checkpoints. The agent operates within the FSM, not outside it.

There are two kinds of workflow states:

  1. Deterministic states — execute a fixed operation (service call, API request, data transformation). No LLM involved. The FSM advances automatically on success or failure.
  2. Agent-delegated states — invoke the agent with a scoped prompt and bounded tool set. The agent reasons, calls tools, and returns a transition choice. The FSM validates that the chosen transition is legal for the current state. If it is not, the transition is rejected and the agent is re-prompted.

This means:

  • The agent never invents states or transitions — only chooses among the ones the FSM declares.
  • The FSM guarantees that every execution path is auditable and bounded.
  • Agent autonomy is scoped: the agent decides which valid transition to take, not what transitions exist.

Example:

State: RESOLVE_DISCREPANCY (agent-delegated)
→ Agent invokes compare_products tool
→ Agent reviews result and picks a transition:
→ ACCEPT_MATCH (if confidence ≥ threshold)
→ REJECT_MATCH (if confidence < threshold)
→ ESCALATE (if comparison is ambiguous)
→ FSM validates chosen transition is in the declared set
→ FSM advances to next state

This pattern prevents unbounded agent loops while preserving the agent's ability to reason about non-deterministic decisions.

12. Memory and Context

Use layered memory rather than one large chat transcript.

Memory types

TypeScopeStoragePurpose
Session memorySingle agent conversationIn-memory / RedisShort-lived task context
Working memorySingle workflow executionDatabase (JSONB)Structured facts gathered during a workflow
Domain memoryPermanentPostgreSQL (Prisma)Product, tenant, and catalog data
Retrieval memoryPer-queryTransientSearch results, embeddings, documents
Evaluation memoryPermanentDatabaseHistorical traces and outcomes for regression

Rules

  • Business truth belongs in domain data, not chat memory.
  • Memory writes must be intentional and typed (Zod schemas).
  • Stale context must be discardable — memory has TTL or explicit invalidation.
  • Agent prompts should remain small and structured — inject only relevant context.
  • No "memory" should circumvent tenant isolation.

13. Security and Governance

AI-first does not weaken safety. It requires stronger controls because agents can act faster and at scale.

Required controls

  • External auth provider with JWT/OIDC (Supabase Auth, Keycloak, or equivalent)
  • tenant_id in auth context on every request
  • RLS in PostgreSQL on every tenant-scoped table
  • Role and feature checks at service and tool level
  • Approval gates for sensitive writes (tools declare when approval is needed)
  • Immutable audit logs (append-only, tenant-scoped)
  • Prompt and tool trace logging (what the agent asked, what tools it called, what it received)
  • Secrets isolation (no secrets in agent context or tool outputs)
  • Rate limiting for external, tool, and agent interfaces

High-risk actions requiring approval

  • Price rule changes
  • Cross-system syndication
  • Large bulk imports (above configurable threshold)
  • Destructive merges or splits of canonical products
  • Reference catalog license changes
  • Data export
  • Any write tool the module's policies.ts flags as approval-required

14. Testing and Evals

Testing splits into two categories that are equally important.

Deterministic tests

  • Unit tests for services (pure business logic)
  • Integration tests for DB behavior (real Postgres)
  • RLS and tenancy tests (verify isolation)
  • Contract tests for APIs and tools (Zod schema compat)
  • Property-based tests with fast-check (minimum 100 iterations per property)

Agent evals

  • Tool selection correctness — given a task description, does the agent pick the right tool?
  • Workflow completion rate — does the agent complete multi-step tasks?
  • Approval compliance — does the agent stop and request approval when required?
  • Hallucination resistance — does the agent invent data not in tool outputs?
  • Retry and recovery behavior — does the agent handle tool failures gracefully?
  • Cost and latency budgets — does the agent stay within token and time limits?
  • Tenant isolation — does the agent ever access cross-tenant data?

Eval infrastructure (Phase 0 deliverable)

The eval harness must be built before the first agent is deployed. It must support:

  • Reproducible test cases with fixed inputs and expected tool sequences
  • Scoring rubrics for partial credit (not just pass/fail)
  • Regression detection (alert when a previously passing eval fails)
  • Cost tracking per eval run
  • Trace capture for debugging failed evals

No module is complete without both deterministic tests and agent evals.

15. Deployment Strategy

Architectural principle

The agent runtime and worker/orchestrator are standalone long-running Node.js processes. This is not negotiable — serverless cold starts, execution time limits, and lack of persistent connections make Vercel Functions unsuitable as the agent runtime host.

Vercel is used for what it excels at: hosting the Next.js admin UI and, optionally, light REST API endpoints (health checks, webhooks, lightweight reads). The heavy lifting — agent orchestration, durable workflows, background jobs, MCP tool serving — runs in standalone processes.

Standalone (primary path)

ConcernSolution
APINestJS standalone process (Docker / any container host)
Agent runtimeStandalone Node.js process (apps/agents/) — long-running, persistent connections
Worker / orchestratorSame process as agent runtime (single binary) or separate process at scale
Admin UINext.js on Vercel or any static/SSR host
DatabasePostgreSQL 16+ (Supabase, RDS, self-hosted) with pgvector, ltree
AuthOAuth2/OIDC delegation via @constellation-platform/auth-core (Supabase Auth, Keycloak, Auth0)
StorageS3-compatible (Supabase Storage, MinIO, AWS S3)
JobsPostgres outbox (default); BullMQ + Redis (opt-in for high-throughput modules)
MCP serverRuns inside the agent runtime process, shares the same tool registry

Vercel + Supabase (UI hosting path)

ConcernSolution
Admin UINext.js on Vercel (SSR + static)
Light API routesVercel Functions for webhooks, health, lightweight reads (optional)
DatabaseSupabase PostgreSQL (pgvector, ltree enabled)
AuthSupabase Auth (JWT with tenant_id in app_metadata)
StorageSupabase Storage (tenant-namespaced)

The API server, agent runtime, and worker processes still run as standalone containers even when Supabase hosts the database and Vercel hosts the UI.

Why not "Vercel for everything"?

An agent-first product needs:

  • Long-running processes for multi-step workflows (minutes, not seconds)
  • Persistent WebSocket/SSE connections for real-time agent status
  • In-process tool registry without cold-start latency
  • Reliable job processing without execution time limits

Vercel serverless cannot provide these. Keeping the UI on Vercel preserves developer experience and CDN benefits without constraining the core runtime.

Both paths use the same codebase. Auth is provider-agnostic via @constellation-platform/auth-core. The job queue interface abstracts Postgres vs BullMQ.

16. Delivery Phases

Phase 0: Foundation (weeks 1-3)

  • Create repo topology and build system (Turborepo)
  • Implement module template with all file slots
  • Set up shared platform packages (auth, db, events, jobs, errors, testing)
  • Build eval harness and trace capture
  • Define tool design standard with Zod contracts
  • Scaffold CLI (stella-cli)
  • CI pipeline with all required gates

Phase 1: Core catalog modules (weeks 4-8)

  • Products module (CRUD + tools + evals)
  • Taxonomy module (ltree + tools)
  • Search module (hybrid search: fulltext + pgvector + tools)
  • Ingest module (webhook + conflict resolution + tools)
  • Pricing module (rules engine + tools)

Phase 2: Agent plane (weeks 6-10, overlaps Phase 1)

  • Orchestrator agent with tool registry
  • Specialist agents (search, pricing, validation)
  • Workflow runtime (typed state machines)
  • Approval system (create/resolve approval requests)
  • Memory stores (session, working, evaluation)
  • Trace capture and eval runner

Phase 3: Advanced modules (weeks 9-14)

  • Supplier offers module
  • Canonicals module (matching, merge/split)
  • Shares module (catalog sharing)
  • Reference catalogs module (enrichment, licensing)
  • Enrichment workflows
  • Channel syndication validation

Phase 4: Hardening (weeks 13-16)

  • Performance optimization against targets
  • Security audit and penetration testing
  • Eval regression suite fully populated
  • Documentation and SDK generation
  • Deployment automation (Docker Compose for standalone; Vercel for UI; Supabase for managed DB)

17. Migration From Existing Documents

From Stella Catalog Spec v1 — keep

  • All 31 requirements and acceptance criteria
  • Multi-tenant model (tenant_id + RLS)
  • PostgreSQL + pgvector + ltree
  • Hybrid search pipeline
  • Ingest/webhook conflict resolution
  • Supplier offer and canonical product concepts
  • Catalog sharing model
  • Reference catalog enrichment
  • CPQ and pricing rules
  • Performance targets
  • CLI commands
  • Supabase database and auth deployment path
  • Constellation platform alignment

From Architecture v2 — adopt

  • Three-layer architecture (Domain Core / Tool Layer / Agent Plane)
  • Tool-first design (business actions, not CRUD wrappers)
  • Explicit module template with tools.ts, workflows.ts, policies.ts, evals/
  • Agent eval requirements alongside deterministic tests
  • Memory model (session, working, domain, retrieval, evaluation)
  • Workflow design standard (explicit state machines)
  • AI-friendly development rules (small files, no magic, predictable names)
  • Module rules (no cross-module repo imports, no deep inheritance)

From Architecture v2 — do not adopt

  • Python/FastAPI backend (breaks Constellation alignment, splits the stack)
  • PydanticAI (use MCP SDK + custom orchestration in TypeScript)
  • Temporal (premature; start with simple typed FSM)
  • Raw SQL repositories (use Prisma for consistency with Constellation)

From Stella Catalog Spec v1 — replace

  • "MCP tools as thin adapters" becomes "MCP tools as first-class business capabilities"
  • Add apps/agents/ as a first-class application
  • Add evals/ at module and system level
  • Add policies.ts and workflows.ts to module template

18. Risks and Mitigations

RiskImpactMitigation
NestJS decorator magic obscures control flow for AI agentsMediumKeep modules thin; avoid custom decorators; lint for complexity
TypeScript agent ecosystem less mature than PythonMediumUse MCP SDK (TS-native); build thin orchestration layer; call Python for embeddings if needed
Tool layer drifts back to CRUD wrappersHighCI gate: every tool must have evals; review tool names for business-action language
Constellation shared platform becomes a dragMediumKeep shared packages small and stable; don't block product delivery on platform work
Eval infrastructure gets deprioritizedHighMake evals a Phase 0 deliverable; no module ships without evals
Workflow complexity escalatesMediumStart with simple typed FSM; introduce Temporal only with operational evidence
Agent costs spiralMediumToken budgets per task; model selection per specialist; cost tracking in eval harness
Postgres job queue hits throughput ceilingLowBullMQ adapter already exists in @constellation-platform/jobs; swap per-module without architecture change

19. Open Questions

Resolved

  1. Agent hosting model — Resolved: the agent runtime is a standalone long-running Node.js process (apps/agents/). It does not run on Vercel serverless. See Section 15.
  2. Workflow persistence — Resolved: DB-serialized typed FSM. Workflow state lives in a workflow_runs table with Zod-validated JSON state. See Section 6.3.
  3. MCP server hosting — Resolved: the MCP tool server runs inside the agent runtime process, sharing the same tool registry and in-process access to domain services.

Still open — resolve before implementation begins:

  1. Eval tooling: Build custom eval harness or adopt an existing framework (e.g., Braintrust, Promptfoo)?

Resolved (post-v3):

  1. Constellation package publishing — Resolved: GitHub Packages private registry. @constellation-platform/* packages live in a separate platform-packages repo, versioned with SemVer + Changesets. See Stella_Constellation_AI_Shared_Architecture_Plan_v1.md.

If this architecture is accepted, the next documents to create are:

  1. docs/adr/ADR-001-merged-architecture-v3.md — records the decision to merge v1 and v2
  2. docs/module_template_v1.md — detailed module template with code examples
  3. docs/tool_design_standard_v1.md — tool definition contract, taxonomy, and examples
  4. docs/workflow_design_standard_v1.md — state machine patterns, approval checkpoints
  5. docs/eval_standard_v1.md — eval harness design, scoring rubrics, regression detection
  6. docs/agent_plane_design_v1.md — orchestrator, specialists, memory, trace capture
  7. Updated .kiro/specs/stella-catalog/design.md — aligned with v3 architecture
  8. Updated .kiro/specs/stella-catalog/tasks.md — re-sequenced for v3 phases

21. Summary

Architecture v3 takes the best from both predecessors:

  • From v2: The three-layer model (Domain Core, Tool Layer, Agent Plane), tool-first design, explicit module template with tools/workflows/policies/evals, eval-first mindset, structured memory model, and AI-friendly development rules.
  • From v1: TypeScript/NestJS stack, Prisma ORM, Zod validation, Constellation shared platform alignment, Supabase database/auth path, all 31 domain requirements, and the existing task breakdown.

Key decisions tightened after expert review:

  • One durable execution model: Postgres outbox + DB-backed workflow state as the canonical default. BullMQ optional for throughput-heavy modules. No Inngest/Trigger.dev dependency.
  • Agent runtime is a standalone process: Long-running Node.js process with persistent connections, in-process tool registry, and MCP server. Not serverless.
  • Vercel scoped to UI hosting: Next.js admin on Vercel for CDN and developer experience. API, agents, and workers run as standalone containers.
  • packages/agent-sdk/ stays thin: Types, interfaces, and helpers — not an orchestration framework. All orchestration logic lives in apps/agents/.

The result is a system that is genuinely AI-native at runtime (agents are first-class users with proper orchestration, memory, and evaluation) and AI-native in development (every module is predictable, explicit, and fits in a context window) — without sacrificing platform alignment, operational control, or full-stack coherence.


Appendix A: Runtime Primitives

This appendix defines the runtime contracts that Section 6.3 refers to. These are implementation-ready specifications — not guidelines. Platform packages (@constellation-platform/jobs, @constellation-platform/db) must conform to them.

A.1 Job Claiming and Retry Semantics

All background work flows through a single job_queue table in the application's PostgreSQL database.

Table schema

CREATE TABLE job_queue (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id uuid NOT NULL,
actor_id text, -- user/system that enqueued the job; nullable for system-initiated jobs
correlation_id text, -- propagated from request context; nullable for cron/sweep-initiated jobs
queue text NOT NULL, -- e.g. 'embeddings', 'ingest', 'enrichment'
payload jsonb NOT NULL,
status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','claimed','completed','failed','dead')),
run_at timestamptz NOT NULL DEFAULT now(),
claimed_at timestamptz,
claimed_by text, -- worker instance id
completed_at timestamptz,
attempt int NOT NULL DEFAULT 0,
max_attempts int NOT NULL DEFAULT 5,
last_error text,
idempotency_key text UNIQUE, -- optional; callers may set for dedup
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX idx_job_queue_poll ON job_queue (queue, status, run_at)
WHERE status = 'pending';

Claim protocol

Workers claim jobs with a single atomic statement. No advisory locks, no two-phase claim.

UPDATE job_queue
SET status = 'claimed',
claimed_at = now(),
claimed_by = $1, -- worker instance id
attempt = attempt + 1
WHERE id = (
SELECT id FROM job_queue
WHERE queue = $2
AND status = 'pending'
AND run_at <= now()
ORDER BY run_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *;

FOR UPDATE SKIP LOCKED ensures multiple workers never claim the same row. This is the only job-claiming mechanism in the system.

Retry semantics

BehaviorRule
BackoffExponential: run_at = now() + (2^attempt * base_interval). Default base_interval = 5 seconds.
Max attemptsPer-job max_attempts, default 5.
Dead-letterAfter max_attempts exhausted, status moves to dead. Dead jobs are never auto-retried.
Stale claim recoveryA periodic sweep (every 60s) resets jobs stuck in claimed for longer than claim_timeout (default 5 minutes) back to pending.
IdempotencyIf idempotency_key is set, a second INSERT with the same key is a no-op (ON CONFLICT DO NOTHING).

Notification

After inserting a job, the writer issues:

NOTIFY job_queue, '<queue_name>';

Workers LISTEN job_queue and wake immediately. The poll loop (interval: 1s) is the fallback if a notification is missed.

BullMQ upgrade path

When a module opts into BullMQ (approved per-module, documented in the module's README.md), the @constellation-platform/jobs adapter routes that queue to Redis instead of Postgres. The JobQueue interface is identical — callers do not change. The job_queue table is not used for BullMQ-backed queues.

A.2 Workflow Row Schema and State Transitions

All durable workflows (agent tasks, ingest pipelines, enrichment runs, bulk operations) share a single workflow_runs table. Each workflow type defines its own Zod-validated state shape.

Table schema

CREATE TABLE workflow_runs (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id uuid NOT NULL,
actor_id text, -- user/agent that started the workflow; nullable for system-triggered workflows
correlation_id text, -- propagated from originating request; nullable for scheduled workflows
workflow_type text NOT NULL, -- e.g. 'ingest_pipeline', 'enrichment', 'agent_task'
status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','running','waiting_approval','completed','failed','cancelled')),
state jsonb NOT NULL DEFAULT '{}', -- Zod-validated per workflow_type
input jsonb NOT NULL, -- immutable; the original request
output jsonb, -- set on completion
error text, -- set on failure
started_at timestamptz,
completed_at timestamptz,
updated_at timestamptz NOT NULL DEFAULT now(),
created_at timestamptz NOT NULL DEFAULT now(),
parent_id uuid REFERENCES workflow_runs(id), -- for sub-workflows
trace_id text -- OpenTelemetry trace correlation
);

CREATE INDEX idx_workflow_runs_active ON workflow_runs (tenant_id, status)
WHERE status IN ('pending','running','waiting_approval');

State machine contract

Each workflow type must provide:

interface WorkflowDefinition<
TState extends z.ZodType,
TInput extends z.ZodType,
TOutput extends z.ZodType,
> {
type: string; // matches workflow_type column
stateSchema: TState; // Zod schema for the state column
inputSchema: TInput;
outputSchema: TOutput;
initialState: (input: z.infer<TInput>) => z.infer<TState>;
transitions: WorkflowTransition<TState>[]; // ordered list of named steps
}

interface WorkflowTransition<TState extends z.ZodType> {
name: string;
from: string[]; // allowed status values to enter this transition
execute: (state: z.infer<TState>, ctx: WorkflowContext) => Promise<TransitionResult<TState>>;
}

type TransitionResult<TState extends z.ZodType> =
| { action: 'continue'; state: z.infer<TState> }
| { action: 'wait_approval'; state: z.infer<TState>; approvalRequest: ApprovalRequest }
| { action: 'complete'; output: unknown }
| { action: 'fail'; error: string };

Status lifecycle

pending → running → completed
→ failed
→ waiting_approval → running (after approval granted)
→ cancelled (after approval denied or timeout)

Transitions are always single-row UPDATEs with an optimistic concurrency check:

UPDATE workflow_runs
SET status = $1,
state = $2,
updated_at = now()
WHERE id = $3
AND status = $4 -- expected current status
RETURNING *;

If the UPDATE returns zero rows, the transition is rejected (concurrent modification). The caller retries from a fresh read.

A.3 Raw SQL Policy

Prisma is the default data access layer. Raw SQL is allowed only for platform primitives where Prisma cannot express the operation correctly or efficiently.

Permitted raw SQL

Use caseReasonOwner
Session-level RLS context (SET LOCAL app.tenant_id = $1)Prisma has no session-variable API; must be raw parameterized SQL inside a transaction@constellation-platform/db
Job claim (FOR UPDATE SKIP LOCKED)Prisma does not support SKIP LOCKED@constellation-platform/jobs
Workflow transition (optimistic UPDATE ... WHERE status = $expected)Must be a single atomic statement, not read-then-write@constellation-platform/jobs
RLS policy setup (ALTER TABLE ... ENABLE ROW LEVEL SECURITY, CREATE POLICY)DDL, not data access — migration-time only (see rule 4)@constellation-platform/db
Extension setup (CREATE EXTENSION, pg_cron schedule management)Extension DDL — migration-time only (see rule 4)@constellation-platform/db, @constellation-platform/jobs
LISTEN / NOTIFYPrisma does not expose Postgres channels@constellation-platform/events
Recursive ltree queries (@>, <@, lquery)Prisma does not support ltree operators nativelymodules/taxonomy/repository.ts
Hybrid search ranking (ts_rank + pgvector distance in one query)Prisma cannot compose full-text and vector scoringmodules/search/repository.ts

Rules

  1. Raw SQL lives in the repository or platform infrastructure layer only (modules/*/repository.ts or @constellation-platform/*) — never in services, tools, or workflows.
  2. Every raw SQL call must be wrapped in a typed function with Zod-validated inputs and outputs.
  3. Raw SQL must be annotated with a comment referencing this appendix section: // Raw SQL: see Architecture v3, Appendix A.3.
  4. Migration files use prisma migrate for schema changes. Exception: RLS policy DDL (ENABLE ROW LEVEL SECURITY, CREATE POLICY), extension DDL (CREATE EXTENSION), and pg_cron schedules cannot be expressed through Prisma's schema — these are the only raw SQL permitted in migration files, and each must reference this appendix.
  5. If Prisma adds support for a currently-raw operation, the raw SQL must be replaced in the next cleanup cycle.

A.4 Composition Root

Both apps/api/ and apps/agents/ share a single composition root that wires all dependencies. This avoids duplicate initialization, inconsistent config, and drift between the two processes.

Structure

// packages/platform/runtime/composition-root.ts

export interface AppContext {
// Config
config: AppConfig; // validated with Zod at startup

// Database
prisma: PrismaClient; // single instance, RLS-aware

// Platform services
jobQueue: JobQueue; // Postgres-backed (or BullMQ per-queue override)
eventBus: EventBus; // domain events → outbox → LISTEN/NOTIFY

// Module registries
toolRegistry: ToolRegistry; // all module tools, keyed by name
workflowRegistry: WorkflowRegistry; // all workflow definitions, keyed by type

// Cross-cutting
tenantContext: TenantContextProvider; // extracts tenant_id from JWT / request
logger: Logger; // structured, OpenTelemetry-correlated
tracer: Tracer; // OpenTelemetry tracer
}

export function createAppContext(overrides?: Partial<AppContext>): Promise<AppContext>;

Wiring rules

  1. createAppContext() is called exactly once per process — at the top of apps/api/main.ts and apps/agents/main.ts.
  2. Module registration is declarative. Each module exports a register(ctx: AppContext) function that registers its tools, workflows, and event handlers. No module reaches into another module's internals.
  3. apps/api/ calls createAppContext() then boots the NestJS HTTP server. It registers all module tools (for the REST-facing tool endpoints) but does not start the workflow runner or job workers.
  4. apps/agents/ calls createAppContext() then starts the workflow runner, job workers, and MCP server. It registers the same tools (for agent use) and additionally starts the orchestrator and specialist agents.
  5. Overrides for testing. createAppContext({ prisma: testPrismaClient, jobQueue: inMemoryQueue }) replaces real dependencies with test doubles. Every integration test uses this — no mocking of internal imports.
  6. No NestJS module injection for platform primitives. AppContext is a plain object, not a NestJS provider tree. NestJS controllers receive AppContext via a single provider binding. This keeps platform code framework-independent and usable by apps/agents/ (which is not a NestJS app).

A.5 Tenant Context Propagation

tenant_id must survive across all async boundaries — HTTP requests, job execution, workflow transitions, and event handlers — without requiring callers to pass it manually through every function signature.

Mechanism

The platform uses Node.js AsyncLocalStorage as the canonical tenant context carrier.

// packages/platform/runtime/tenant-context.ts

import { AsyncLocalStorage } from 'node:async_hooks';

interface TenantContext {
tenantId: string;
actorId: string;
correlationId: string;
}

export const tenantStore = new AsyncLocalStorage<TenantContext>();

export function getCurrentTenant(): TenantContext {
const ctx = tenantStore.getStore();
if (!ctx)
throw new Error('Tenant context not set — are you outside a request/job/workflow scope?');
return ctx;
}

Context restoration rules

When restoring context from a persisted row, actorId and correlationId may be null (e.g. for cron-triggered jobs or system-initiated workflows). The restoration code must handle this:

  • tenantId — always present; read from the row's tenant_id column. Required.
  • actorId — read from the row's actor_id column if present; falls back to 'system' if null.
  • correlationId — read from the row's correlation_id column if present; a new ID is generated if null.
Entry pointHow tenant context is set
HTTP request (apps/api)NestJS middleware extracts tenant_id, sub (actor), and correlation header from the JWT/request and enters tenantStore.run() before the controller executes. All three fields are available.
Job claim (apps/agents)After FOR UPDATE SKIP LOCKED returns a job row, the worker reads tenant_id, actor_id, and correlation_id from the row and enters tenantStore.run(). actor_id and correlation_id may be null (see fallback rules above).
Workflow transition (apps/agents)After loading the workflow_runs row, the runner reads tenant_id, actor_id, and correlation_id from the row and enters tenantStore.run(). Same fallback rules apply.
Event handlerThe outbox consumer reads tenant_id, actor_id, and correlation_id from the event payload and enters tenantStore.run().
MCP tool invocationThe MCP server reads tenant_id from the session/request metadata and enters tenantStore.run(). Actor is the agent identity; correlation ID comes from the session trace.

Prisma integration

The Prisma client auto-attaches tenant_id to the database session for RLS enforcement. The implementation must satisfy two invariants:

  1. Parameterized SQL only — never interpolate tenantId into a string. Use $executeRaw with tagged template literals (see A.3 whitelist).
  2. Same session guarantee — the SET LOCAL and the subsequent query must execute within the same database transaction/session, otherwise the RLS variable is not visible to the query.

Note: The snippet below is conceptual pseudocode illustrating the required architecture constraints. The exact Prisma $extends / client-extension API may differ at implementation time. What matters is that the two invariants above are satisfied. The concrete implementation belongs in packages/platform/db/ and must be validated against the Prisma version in use.

// CONCEPTUAL — packages/platform/db/prisma-tenant.ts
// Validates against: Prisma client extensions API (verify exact signatures at implementation time)

export function createTenantAwarePrisma(basePrisma: PrismaClient): PrismaClient {
return basePrisma.$extends({
query: {
async $allOperations({ args, query, model, operation }) {
const { tenantId } = getCurrentTenant();
// Invariant 2: $transaction ensures SET LOCAL and query share the same PG session.
// SET LOCAL scopes the variable to the current transaction only —
// it is automatically reset when the transaction commits or rolls back.
return basePrisma.$transaction(async (tx) => {
// Invariant 1: tagged template — parameterized, not interpolated.
// Raw SQL: see Architecture v3, Appendix A.3 (session-level RLS context)
await tx.$executeRaw`SET LOCAL app.tenant_id = ${tenantId}::text`;
// Route the original query through tx, NOT through basePrisma,
// to guarantee it sees the SET LOCAL variable.
// (Exact dispatch mechanism depends on Prisma extension API version.)
return /* dispatch original operation through tx */;
});
},
},
});
}

Key constraints the implementation must honour (regardless of exact Prisma API shape):

  • $executeRaw tagged template, not $executeRawUnsafe: Prisma's tagged template $executeRaw parameterizes values automatically. $executeRawUnsafe accepts a raw string and is subject to SQL injection if callers ever interpolate user input. The implementation must use the tagged-template form.
  • Query routed through transactional client: The query following SET LOCAL must execute on the same transactional connection (tx), not through the base Prisma client. If Prisma's $extends callback provides a query() function that dispatches through the base client, it must not be used — find the equivalent that routes through tx.

Invariant: runtime context vs serialized state

AsyncLocalStorage is the canonical in-process runtime carrier for tenant context. However, tenant_id also appears as persisted data in rows that cross process boundaries. These are distinct concerns:

Runtime context (in-process):

Within a running request, job handler, workflow transition, or tool execution, services read tenant context from AsyncLocalStorage — never from function parameters. This prevents:

  • accidental tenant ID mismatch between caller and callee
  • RLS bypass when a worker forgets to set the session variable
  • proliferation of tenantId through every function signature

Serialized state (persisted):

Rows in job_queue, workflow_runs, outbox_events, and domain event envelopes must carry tenant_id (and optionally actor_id, correlation_id) as explicit columns. This is data, not runtime context — it is how context is reconstructed after serialization, process restart, or delayed execution. Boundary DTOs that cross process or network boundaries (e.g. webhook payloads, MCP session metadata) also carry tenant_id as data.

The rule: No in-process business logic (service method, tool handler, policy check, workflow transition function) should accept tenantId as a function parameter. These read from AsyncLocalStorage. Persistence layers, serialization boundaries, and entry-point bootstrapping code are the only places that read and write tenant_id as explicit data.