Architecture v3
Status
This document is a merged architecture that combines the conceptual model from Architecture v2 with the technology stack and platform alignment from Stella Catalog Spec v1.
- It is intended for a greenfield implementation.
- It assumes no production code exists yet.
- It preserves Constellation shared platform alignment.
- It adopts the three-layer architecture (Domain Core, Tool Layer, Agent Plane) from v2.
- It keeps the TypeScript/NestJS stack from v1 for full-stack coherence and deployment simplicity.
1. Goals
The system should be:
- AI-first at runtime
- Agents are first-class users of the platform.
- The platform supports planning, tool use, memory, evaluation, and human approval flows.
- MCP tools are business capabilities, not CRUD wrappers.
- AI-first in development
- AI coding agents should be able to add modules and features with low context overhead.
- Module boundaries, contracts, and tests must be explicit and machine-readable.
- Every module follows the same template with predictable file names and clear responsibilities.
- Deterministic where it matters
- Product data, pricing, permissions, tenant isolation, and audit remain domain-controlled.
- Agents never directly own business truth.
- The domain core is testable without an LLM.
- Modular without premature distribution
- The system starts as a modular monolith.
- Modules can be extracted later if scale or team topology requires it.
- Safe for enterprise and multi-tenant use
- Strong tenant isolation, auditability, idempotency, policy checks, and approval gates are built in from day one.
- Platform-aligned
- Shares technology foundations with Constellation via
@constellation-platform/*packages. - One language (TypeScript) across API, admin, contracts, SDKs, and agent plane.
- Shares technology foundations with Constellation via
2. Core Position
This architecture rejects both extremes:
- Not a classic centralized CRUD/API platform with thin MCP wrappers bolted on.
- Not a swarm of autonomous agents directly mutating shared state.
Instead, the platform is split into three layers:
- Domain Core — Deterministic business logic and source-of-truth data.
- Tool Layer — High-level business capabilities exposed to agents and humans.
- Agent Plane — Planning, orchestration, delegation, memory, approvals, and workflow execution.
The domain core is authoritative. The agent plane is adaptive. The tool layer is the contract between them.
Why three layers matter
Without an explicit agent plane, agent behavior gets scattered across API controllers, service methods, and ad-hoc scripts. Without a tool layer distinct from CRUD endpoints, agents must understand implementation details instead of working at the intent level. The three-layer split ensures each concern has a clear owner.
3. Technology Stack
Primary stack
| Component | Technology |
|---|---|
| Backend runtime | Node.js 20+, TypeScript (strict) |
| API framework | NestJS |
| Admin UI | Next.js (App Router) |
| Validation & contracts | Zod v4 + JSON Schema + OpenAPI |
| ORM | Prisma 6 |
| Database | PostgreSQL 16+ |
| Extensions | pgvector, ltree, pg_trgm, pgcrypto, uuid-ossp |
| Job queue (default) | Postgres outbox + LISTEN/NOTIFY (via @constellation-platform/jobs) |
| Job queue (high-throughput) | BullMQ + Redis (opt-in per module) |
| Object storage | S3-compatible (MinIO for dev; Supabase Storage on cloud) |
| Auth | OAuth2/OIDC delegation; JWT with sub, tenant_id, roles |
| Observability | OpenTelemetry (Jaeger dev, NewRelic/Datadog prod) |
| Testing | Vitest + fast-check (property-based) + integration tests |
| Agent tooling | MCP SDK (TypeScript), custom orchestration layer |
| Packaging | Monorepo (Turborepo) |
Why TypeScript-first
TypeScript is the better default for this product because:
- Constellation shares a common platform in TypeScript; switching languages breaks alignment.
- One language across API, admin UI, contracts, SDKs, and agent plane reduces context-switching for both humans and AI coding agents.
- The MCP SDK is TypeScript-native.
- Prisma, Zod, and the NestJS ecosystem are mature and well-understood by AI coding tools.
- Node.js provides a real long-running process model needed by agents, workers, and durable workflows.
Python remains available for:
- Embedding model integrations where Python libraries have no TypeScript equivalent.
- Data science and evaluation scripts.
- Specialized agent evaluations using Python eval frameworks.
These are called as external processes or microservices, not as the primary runtime.
Constellation platform alignment
Shared packages from @constellation-platform/* (published from the external platform-packages repo via GitHub Packages private registry; consumed as versioned npm dependencies in Stella's package.json):
| Package | Purpose |
|---|---|
auth-core | JWT validation, tenant context extraction, provider abstraction |
db | Prisma client setup, RLS helpers, migration utilities |
events | Domain event contracts, outbox pattern, LISTEN/NOTIFY |
jobs | Postgres outbox + LISTEN/NOTIFY by default; BullMQ adapter for high-throughput |
errors | Typed error classes, API error envelope |
testing | Test fixtures, property-based test helpers, integration test utilities |
4. Repository Topology
apps/
├── api/ # NestJS backend: REST API + tool API + auth
├── agents/ # Agent runtime: orchestrator, specialists, memory, approvals
└── admin/ # Next.js admin UI
packages/
├── contracts/ # Zod schemas, OpenAPI specs, generated SDKs
├── agent-sdk/ # Thin primitives: tool registry types, memory interfaces, eval helpers
└── module-template/ # Scaffolding for new modules
# External: @constellation-platform/* packages (separate `platform-packages` repo)
# Published to GitHub Packages private registry, consumed as versioned npm dependencies.
# See Stella_Constellation_AI_Shared_Architecture_Plan_v1.md for package contracts.
# Packages: auth-core, auth-nest, db, events, jobs, errors, testing
modules/
├── products/
├── taxonomy/
├── search/
├── ingest/
├── pricing/
├── supplier-offers/
├── canonicals/
├── shares/
└── reference-catalogs/
Key differences from v1
apps/agents/is a first-class application, not hidden inside the API.packages/agent-sdk/provides thin primitives (tool registry types, memory interfaces, eval helpers) — not an orchestration framework. Orchestration logic lives inapps/agents/.packages/module-template/provides scaffolding for AI coding agents.- Each module includes
tools.ts,workflows.ts,policies.ts, andevals/.
Key differences from v2
- TypeScript throughout, not Python.
- Prisma instead of raw repositories.
- Shared platform packages instead of standalone runtime libraries.
- Supabase database and auth path preserved; Vercel used for UI hosting.
5. Architectural Layers
5.1 Domain Core
The domain core owns:
- Product and catalog state
- Taxonomy and classifications
- Supplier offers and canonical products
- Pricing and quote rules
- Ingestion and conflict resolution
- Tenant configuration
- Permissions and policy checks
- Audit records
The domain core must be:
- Deterministic
- Idempotent
- Transaction-safe
- Tenant-scoped (RLS enforced)
- Independently testable without an LLM
Implementation lives in each module's service.ts, repository.ts, models.ts, and events.ts.
5.2 Tool Layer
Tools are not CRUD wrappers. Tools are business actions.
Good tool examples:
search_catalog— semantic search with natural languagecompare_products— structured attribute comparison across productsbuild_quote— assemble a priced quote from product selectionsfind_substitutes— locate alternative products meeting criteriavalidate_channel_readiness— check if products meet syndication requirementsenrich_from_reference_catalog— pull and apply reference dataexplain_price_decision— trace how a price was calculatedreview_match_suggestion— present a supplier-offer match for human review
Bad tool examples (avoid):
create_product— too low-level, no business contextupdate_rule— exposes implementation detailslist_records— generic, no agent value
CRUD endpoints still exist for the human-facing REST API, but agent-facing tools must be higher-level, bounded, and policy-aware.
Each tool must define:
- Clear purpose and description
- Zod-validated inputs and outputs
- Permission requirements
- Tenant scoping rules
- Failure modes and error contracts
- Idempotency behavior
- Usage examples
- Eval cases
Implementation lives in each module's tools.ts.
Tool taxonomy
- Read tools — search, compare, explain, inspect. Safe to call freely.
- Write tools — bounded state changes with approval and policy checks. Rare and strongly governed.
- Composite tools — high-level actions coordinating multiple domain services. May trigger workflows.
5.3 Agent Plane
The agent plane owns:
- Task planning and decomposition
- Tool sequencing and selection
- Multi-step workflow orchestration
- Bounded delegation to specialist agents
- Short-term working memory (session-scoped)
- Structured working memory (facts gathered during a workflow)
- Approval request creation and resolution
- Retries and recovery
- Evaluation and trace capture
The agent plane does not directly write business data. All mutations go through tools, which enforce policies and tenant scoping.
Implementation lives in apps/agents/ (orchestration, workflows, specialist agents) with thin shared types from packages/agent-sdk/ (tool registry interfaces, memory store contracts, eval runner helpers).
6. Runtime Model
6.1 API surfaces
The system exposes three distinct interfaces:
- Human API — REST/JSON for admin UI and external integrations. Served by
apps/api/. - Tool API — MCP protocol and internal tool registry exposing business capabilities. Tools registered from each module's
tools.ts. - Workflow API — Start, resume, cancel, and query long-running agent tasks. Served by
apps/agents/.
6.2 Execution modes
- Request/response — Synchronous validation, search, comparison, quoting. Used by both human API and tools.
- Durable workflow — Ingest pipelines, enrichment, large quote builds, catalog validation batches. Managed by the agent plane with explicit typed state machines whose state is persisted to PostgreSQL.
- Background jobs — Embeddings, reindexing, document parsing, sync tasks. Managed by the job queue.
6.3 Durable execution model (canonical)
The system uses one default execution backbone. This resolves the ambiguity between BullMQ, Inngest, Trigger.dev, and custom FSMs.
| Concern | Default | When to upgrade |
|---|---|---|
| Job dispatch | Postgres outbox + LISTEN/NOTIFY via @constellation-platform/jobs | Never — this is the canonical primitive |
| Workflow state | DB-serialized typed FSM — workflow state lives in a workflow_runs table with Zod-validated JSON state | Never — all workflows use this |
| High-throughput queues | BullMQ + Redis (opt-in) | Only when a module proves Postgres throughput is insufficient (e.g., bulk embeddings at 10k+ items) |
| Cron / scheduled | pg_cron or application-level scheduler | Default; no external dependency |
Why not Inngest/Trigger.dev? They couple to Vercel's execution model and add an external dependency. Starting with Postgres-native primitives keeps the system self-contained and portable. If operational evidence later shows Postgres LISTEN/NOTIFY cannot keep up, BullMQ is already available as an adapter — no architectural change needed.
Why not a custom FSM library? The FSM is not a library. It is a pattern: each workflow defines a Zod-validated state type, a transition function, and a workflow_runs table row. The pattern is codified in packages/agent-sdk/ as types and helpers, not as a runtime engine.
6.4 Multi-agent strategy
Default to single orchestrator agent + specialist helpers, not full peer swarms.
Recommended initial specialist agents:
- Search agent — handles catalog search, comparison, and similarity queries
- Pricing/quote agent — builds quotes, explains pricing, applies rules
- Enrichment agent — matches reference catalogs, applies enrichment data
- Validation agent — checks channel readiness, data quality, schema compliance
Add more specialists only when evals show a clear gain. Avoid premature agent proliferation.
7. Data Architecture
7.1 Primary database
PostgreSQL remains the system of record.
Use it for:
- Transactional data (Prisma-managed)
- JSONB flexible attributes
ltreetaxonomy pathspgvectorembeddings- Outbox/events (LISTEN/NOTIFY + polling)
- Workflow state metadata
- Audit logs
7.2 Tenancy
Single tenant key: tenant_id.
Rules:
- Every tenant-scoped table has
tenant_id - PostgreSQL RLS is mandatory on all tenant-scoped tables
- Services still scope by
tenant_idas defense-in-depth - Tools inherit tenant context from auth/session
- Cross-tenant reads and writes fail by design
Avoid dual-scope organization_id + tenant_id unless a concrete business case demands both.
7.3 Eventing
Use domain events and an outbox table.
Initial design:
- Postgres outbox for durability
- Typed event schemas (Zod)
- Idempotent consumers
- Event versioning
- LISTEN/NOTIFY for low-latency in-process delivery
Do not introduce Kafka or distributed buses at the start.
8. Module Standard
Every module must follow the same structure.
modules/<module-name>/
├── README.md # Module purpose, domain concepts, API surface
├── AGENTS.md # Instructions for AI coding agents working on this module
├── schemas.ts # Zod input/output models
├── models.ts # Prisma-facing entities and typed domain models
├── service.ts # Deterministic business logic
├── tools.ts # Agent-facing tool definitions
├── workflows.ts # Explicit state machine workflows
├── policies.ts # Permission and approval logic
├── repository.ts # DB access via Prisma
├── events.ts # Emitted/consumed domain events
├── tests/ # Unit and integration tests
└── evals/ # Agent/tool evaluations and regression suites
File responsibilities
| File | Owns | Does not own |
|---|---|---|
schemas.ts | Zod input/output schemas for API and tools | Prisma types, DB concerns |
models.ts | Prisma model types, domain value objects | Business logic |
service.ts | Deterministic business logic, validation | DB access, HTTP concerns |
tools.ts | Agent-facing tool definitions, MCP registration | Business logic (delegates to service) |
workflows.ts | Multi-step orchestrated procedures, state machines | Direct DB access |
policies.ts | Permission checks, approval gate logic | Authentication (handled by platform) |
repository.ts | Prisma queries, tenant-scoped data access | Business logic |
events.ts | Domain event definitions, emission, consumption | Side effects outside the module |
tests/ | Unit tests, integration tests, property-based tests | Agent evals |
evals/ | Tool selection tests, workflow completion tests, regression suites | Deterministic unit tests |
Module rules
- No module imports another module's repository directly.
- Cross-module access goes through services or tool contracts.
- No hidden magic registration — all wiring is explicit.
- No large base classes or deep inheritance.
- No decorator-heavy abstractions that obscure control flow.
- File count per module should remain small enough for AI tools to reason over (aim for < 15 files).
- Every module has a short
README.mdandAGENTS.md.
9. AI-Friendly Development Rules
The system must be intentionally easy for AI coding agents to extend.
Required practices
- Explicit Zod schemas everywhere — no implicit types or
any. - One clear module template — every module looks the same.
- Predictable file names — an AI agent can find
tools.tsin any module. - Minimal framework magic — avoid custom decorators, interceptor chains, and DI tricks.
- Generated SDKs from one contract source (OpenAPI from Zod schemas).
- Usage examples for every tool.
- Evals for every non-trivial workflow.
- Local fixtures for every module.
AGENTS.mdin every module describing domain concepts, boundaries, and gotchas.
NestJS guardrails
NestJS is the API framework for apps/api/ because of Constellation platform alignment and broad AI training-data coverage. However, NestJS's decorator-heavy DI system is a known friction point for AI coding agents. The following rules constrain NestJS usage to the predictable subset:
Banned patterns:
- ❌ Custom decorators — all cross-cutting concerns use standard NestJS decorators or middleware.
- ❌ Request-scoped providers — all services are singletons. Tenant context uses
AsyncLocalStorage(Appendix A.5), not request-scoped injection. - ❌
forwardRef()— indicates circular dependencies; refactor instead. - ❌ Custom interceptor chains — use at most one global interceptor (for correlation ID / logging).
- ❌ Dynamic modules with runtime configuration — keep module registration static and declarative.
- ❌ Deep NestJS DI for platform primitives —
AppContextis a plain object, not a NestJS provider tree (Appendix A.4).
Required patterns:
- ✅ Each module registers exactly one NestJS module with one controller and one service provider.
- ✅ Controllers are thin: validate input (Zod pipe) → delegate to
AppContext-wired service → return result. - ✅ All business logic lives in
modules/*/service.ts, never in NestJS controllers or providers. - ✅ Module services receive
AppContextthrough the composition root, not through@Inject()tokens. - ✅ The NestJS
module.tsfile is boilerplate — AI agents copy it from the module template without modification.
Escape hatch: If AI coding agents consistently fail on NestJS DI wiring during Phase 0 implementation, the composition root architecture allows apps/api/ to be replaced with Fastify + tRPC without changing any module code. Module contracts are framework-independent by design.
Eval scaffolding
AI coding agents struggle to write agentic evals from scratch because the eval harness (mock LLM responses, tool call assertions, trace validation) requires substantial boilerplate.
The CLI must provide:
stella eval:scaffold <tool-name>
This command generates:
- Mock LLM response fixtures for the target tool
- Expected tool call assertion templates
- Eval harness boilerplate with correct imports and
AppContexttest setup - Example passing and failing test cases
- Annotation mapping to the correctness property the eval validates
This scaffolding is a Phase 0 requirement. No AI coding agent should be asked to build downstream module evals until the scaffold command produces a working baseline.
Required CI gates
- Type checking (tsc strict)
- Linting (ESLint)
- Unit tests (Vitest)
- Integration tests (against real Postgres)
- RLS tests (verify tenant isolation)
- Contract compatibility tests (Zod schema backward compat)
- Tool schema validation (all tools have valid Zod I/O)
- Eval regression suite (no tool selection regressions)
- Module boundary enforcement (no cross-module repository imports)
File and context rules
- Keep files focused and small (< 300 lines preferred, < 500 max).
- Prefer pure functions in services where possible.
- Avoid deep inheritance — prefer composition.
- Keep side effects explicit and at module boundaries.
- Each module's full source should fit in an AI agent's context window.
10. Tool Design Standard
Tools are a first-class product surface — the primary way agents interact with the system.
Tool definition contract
Every tool must specify:
interface ToolDefinition {
name: string; // e.g. "search_catalog"
description: string; // clear purpose for agent tool selection
inputSchema: ZodSchema; // validated inputs
outputSchema: ZodSchema; // validated outputs
permissions: string[]; // required roles/permissions
tenantScoping: 'required' | 'system' | 'none';
idempotent: boolean;
failureModes: FailureMode[]; // documented error cases
examples: ToolExample[]; // input/output pairs for agent context
evalCases: EvalCase[]; // regression test cases
}
Tool taxonomy
- Read tools — search, compare, explain, inspect. No side effects. Safe for agents to call freely.
- Write tools — bounded state changes. Must check policies. May require approval. Should be rare.
- Composite tools — high-level actions coordinating multiple services. May trigger durable workflows.
Write tools should be rare and strongly governed. When in doubt, make it a read tool that returns a proposed action, and let approval flow handle the mutation.
11. Workflow Design Standard
Workflows are explicit state machines, not hidden prompt behavior.
Each workflow defines:
- Start conditions and triggers
- Input schema (Zod)
- Named states and transitions
- Tool calls at each state
- Retry policy per step
- Timeout policy per step and overall
- Approval checkpoints (which steps need human sign-off)
- Completion criteria
- Failure and compensation logic
- Evaluation criteria
Example workflow classes
- Quote generation workflow
- Catalog enrichment workflow
- Supplier match review workflow
- Syndication readiness workflow
- Legacy PIM ingest workflow
- Bulk import workflow
Implementation approach
Start with a simple typed finite state machine. Do not introduce Temporal, XState, or a heavy workflow engine unless operational evidence justifies it. The state machine should be:
- Serializable to/from the database (for durability)
- Inspectable (current state, history of transitions)
- Resumable after process restart
- Traceable (every transition logged)
FSM and agent execution model
The FSM owns the skeleton — states, valid transitions, timeouts, and checkpoints. The agent operates within the FSM, not outside it.
There are two kinds of workflow states:
- Deterministic states — execute a fixed operation (service call, API request, data transformation). No LLM involved. The FSM advances automatically on success or failure.
- Agent-delegated states — invoke the agent with a scoped prompt and bounded tool set. The agent reasons, calls tools, and returns a transition choice. The FSM validates that the chosen transition is legal for the current state. If it is not, the transition is rejected and the agent is re-prompted.
This means:
- The agent never invents states or transitions — only chooses among the ones the FSM declares.
- The FSM guarantees that every execution path is auditable and bounded.
- Agent autonomy is scoped: the agent decides which valid transition to take, not what transitions exist.
Example:
State: RESOLVE_DISCREPANCY (agent-delegated)
→ Agent invokes compare_products tool
→ Agent reviews result and picks a transition:
→ ACCEPT_MATCH (if confidence ≥ threshold)
→ REJECT_MATCH (if confidence < threshold)
→ ESCALATE (if comparison is ambiguous)
→ FSM validates chosen transition is in the declared set
→ FSM advances to next state
This pattern prevents unbounded agent loops while preserving the agent's ability to reason about non-deterministic decisions.
12. Memory and Context
Use layered memory rather than one large chat transcript.
Memory types
| Type | Scope | Storage | Purpose |
|---|---|---|---|
| Session memory | Single agent conversation | In-memory / Redis | Short-lived task context |
| Working memory | Single workflow execution | Database (JSONB) | Structured facts gathered during a workflow |
| Domain memory | Permanent | PostgreSQL (Prisma) | Product, tenant, and catalog data |
| Retrieval memory | Per-query | Transient | Search results, embeddings, documents |
| Evaluation memory | Permanent | Database | Historical traces and outcomes for regression |
Rules
- Business truth belongs in domain data, not chat memory.
- Memory writes must be intentional and typed (Zod schemas).
- Stale context must be discardable — memory has TTL or explicit invalidation.
- Agent prompts should remain small and structured — inject only relevant context.
- No "memory" should circumvent tenant isolation.
13. Security and Governance
AI-first does not weaken safety. It requires stronger controls because agents can act faster and at scale.
Required controls
- External auth provider with JWT/OIDC (Supabase Auth, Keycloak, or equivalent)
tenant_idin auth context on every request- RLS in PostgreSQL on every tenant-scoped table
- Role and feature checks at service and tool level
- Approval gates for sensitive writes (tools declare when approval is needed)
- Immutable audit logs (append-only, tenant-scoped)
- Prompt and tool trace logging (what the agent asked, what tools it called, what it received)
- Secrets isolation (no secrets in agent context or tool outputs)
- Rate limiting for external, tool, and agent interfaces
High-risk actions requiring approval
- Price rule changes
- Cross-system syndication
- Large bulk imports (above configurable threshold)
- Destructive merges or splits of canonical products
- Reference catalog license changes
- Data export
- Any write tool the module's
policies.tsflags as approval-required
14. Testing and Evals
Testing splits into two categories that are equally important.
Deterministic tests
- Unit tests for services (pure business logic)
- Integration tests for DB behavior (real Postgres)
- RLS and tenancy tests (verify isolation)
- Contract tests for APIs and tools (Zod schema compat)
- Property-based tests with fast-check (minimum 100 iterations per property)
Agent evals
- Tool selection correctness — given a task description, does the agent pick the right tool?
- Workflow completion rate — does the agent complete multi-step tasks?
- Approval compliance — does the agent stop and request approval when required?
- Hallucination resistance — does the agent invent data not in tool outputs?
- Retry and recovery behavior — does the agent handle tool failures gracefully?
- Cost and latency budgets — does the agent stay within token and time limits?
- Tenant isolation — does the agent ever access cross-tenant data?
Eval infrastructure (Phase 0 deliverable)
The eval harness must be built before the first agent is deployed. It must support:
- Reproducible test cases with fixed inputs and expected tool sequences
- Scoring rubrics for partial credit (not just pass/fail)
- Regression detection (alert when a previously passing eval fails)
- Cost tracking per eval run
- Trace capture for debugging failed evals
No module is complete without both deterministic tests and agent evals.
15. Deployment Strategy
Architectural principle
The agent runtime and worker/orchestrator are standalone long-running Node.js processes. This is not negotiable — serverless cold starts, execution time limits, and lack of persistent connections make Vercel Functions unsuitable as the agent runtime host.
Vercel is used for what it excels at: hosting the Next.js admin UI and, optionally, light REST API endpoints (health checks, webhooks, lightweight reads). The heavy lifting — agent orchestration, durable workflows, background jobs, MCP tool serving — runs in standalone processes.
Standalone (primary path)
| Concern | Solution |
|---|---|
| API | NestJS standalone process (Docker / any container host) |
| Agent runtime | Standalone Node.js process (apps/agents/) — long-running, persistent connections |
| Worker / orchestrator | Same process as agent runtime (single binary) or separate process at scale |
| Admin UI | Next.js on Vercel or any static/SSR host |
| Database | PostgreSQL 16+ (Supabase, RDS, self-hosted) with pgvector, ltree |
| Auth | OAuth2/OIDC delegation via @constellation-platform/auth-core (Supabase Auth, Keycloak, Auth0) |
| Storage | S3-compatible (Supabase Storage, MinIO, AWS S3) |
| Jobs | Postgres outbox (default); BullMQ + Redis (opt-in for high-throughput modules) |
| MCP server | Runs inside the agent runtime process, shares the same tool registry |
Vercel + Supabase (UI hosting path)
| Concern | Solution |
|---|---|
| Admin UI | Next.js on Vercel (SSR + static) |
| Light API routes | Vercel Functions for webhooks, health, lightweight reads (optional) |
| Database | Supabase PostgreSQL (pgvector, ltree enabled) |
| Auth | Supabase Auth (JWT with tenant_id in app_metadata) |
| Storage | Supabase Storage (tenant-namespaced) |
The API server, agent runtime, and worker processes still run as standalone containers even when Supabase hosts the database and Vercel hosts the UI.
Why not "Vercel for everything"?
An agent-first product needs:
- Long-running processes for multi-step workflows (minutes, not seconds)
- Persistent WebSocket/SSE connections for real-time agent status
- In-process tool registry without cold-start latency
- Reliable job processing without execution time limits
Vercel serverless cannot provide these. Keeping the UI on Vercel preserves developer experience and CDN benefits without constraining the core runtime.
Both paths use the same codebase. Auth is provider-agnostic via @constellation-platform/auth-core. The job queue interface abstracts Postgres vs BullMQ.
16. Delivery Phases
Phase 0: Foundation (weeks 1-3)
- Create repo topology and build system (Turborepo)
- Implement module template with all file slots
- Set up shared platform packages (auth, db, events, jobs, errors, testing)
- Build eval harness and trace capture
- Define tool design standard with Zod contracts
- Scaffold CLI (
stella-cli) - CI pipeline with all required gates
Phase 1: Core catalog modules (weeks 4-8)
- Products module (CRUD + tools + evals)
- Taxonomy module (ltree + tools)
- Search module (hybrid search: fulltext + pgvector + tools)
- Ingest module (webhook + conflict resolution + tools)
- Pricing module (rules engine + tools)
Phase 2: Agent plane (weeks 6-10, overlaps Phase 1)
- Orchestrator agent with tool registry
- Specialist agents (search, pricing, validation)
- Workflow runtime (typed state machines)
- Approval system (create/resolve approval requests)
- Memory stores (session, working, evaluation)
- Trace capture and eval runner
Phase 3: Advanced modules (weeks 9-14)
- Supplier offers module
- Canonicals module (matching, merge/split)
- Shares module (catalog sharing)
- Reference catalogs module (enrichment, licensing)
- Enrichment workflows
- Channel syndication validation
Phase 4: Hardening (weeks 13-16)
- Performance optimization against targets
- Security audit and penetration testing
- Eval regression suite fully populated
- Documentation and SDK generation
- Deployment automation (Docker Compose for standalone; Vercel for UI; Supabase for managed DB)
17. Migration From Existing Documents
From Stella Catalog Spec v1 — keep
- All 31 requirements and acceptance criteria
- Multi-tenant model (tenant_id + RLS)
- PostgreSQL + pgvector + ltree
- Hybrid search pipeline
- Ingest/webhook conflict resolution
- Supplier offer and canonical product concepts
- Catalog sharing model
- Reference catalog enrichment
- CPQ and pricing rules
- Performance targets
- CLI commands
- Supabase database and auth deployment path
- Constellation platform alignment
From Architecture v2 — adopt
- Three-layer architecture (Domain Core / Tool Layer / Agent Plane)
- Tool-first design (business actions, not CRUD wrappers)
- Explicit module template with tools.ts, workflows.ts, policies.ts, evals/
- Agent eval requirements alongside deterministic tests
- Memory model (session, working, domain, retrieval, evaluation)
- Workflow design standard (explicit state machines)
- AI-friendly development rules (small files, no magic, predictable names)
- Module rules (no cross-module repo imports, no deep inheritance)
From Architecture v2 — do not adopt
- Python/FastAPI backend (breaks Constellation alignment, splits the stack)
- PydanticAI (use MCP SDK + custom orchestration in TypeScript)
- Temporal (premature; start with simple typed FSM)
- Raw SQL repositories (use Prisma for consistency with Constellation)
From Stella Catalog Spec v1 — replace
- "MCP tools as thin adapters" becomes "MCP tools as first-class business capabilities"
- Add
apps/agents/as a first-class application - Add
evals/at module and system level - Add
policies.tsandworkflows.tsto module template
18. Risks and Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| NestJS decorator magic obscures control flow for AI agents | Medium | Keep modules thin; avoid custom decorators; lint for complexity |
| TypeScript agent ecosystem less mature than Python | Medium | Use MCP SDK (TS-native); build thin orchestration layer; call Python for embeddings if needed |
| Tool layer drifts back to CRUD wrappers | High | CI gate: every tool must have evals; review tool names for business-action language |
| Constellation shared platform becomes a drag | Medium | Keep shared packages small and stable; don't block product delivery on platform work |
| Eval infrastructure gets deprioritized | High | Make evals a Phase 0 deliverable; no module ships without evals |
| Workflow complexity escalates | Medium | Start with simple typed FSM; introduce Temporal only with operational evidence |
| Agent costs spiral | Medium | Token budgets per task; model selection per specialist; cost tracking in eval harness |
| Postgres job queue hits throughput ceiling | Low | BullMQ adapter already exists in @constellation-platform/jobs; swap per-module without architecture change |
19. Open Questions
Resolved
- Agent hosting model — Resolved: the agent runtime is a standalone long-running Node.js process (
apps/agents/). It does not run on Vercel serverless. See Section 15. - Workflow persistence — Resolved: DB-serialized typed FSM. Workflow state lives in a
workflow_runstable with Zod-validated JSON state. See Section 6.3. - MCP server hosting — Resolved: the MCP tool server runs inside the agent runtime process, sharing the same tool registry and in-process access to domain services.
Still open — resolve before implementation begins:
- Eval tooling: Build custom eval harness or adopt an existing framework (e.g., Braintrust, Promptfoo)?
Resolved (post-v3):
- Constellation package publishing — Resolved: GitHub Packages private registry.
@constellation-platform/*packages live in a separateplatform-packagesrepo, versioned with SemVer + Changesets. SeeStella_Constellation_AI_Shared_Architecture_Plan_v1.md.
20. Recommended Next Documents
If this architecture is accepted, the next documents to create are:
docs/adr/ADR-001-merged-architecture-v3.md— records the decision to merge v1 and v2docs/module_template_v1.md— detailed module template with code examplesdocs/tool_design_standard_v1.md— tool definition contract, taxonomy, and examplesdocs/workflow_design_standard_v1.md— state machine patterns, approval checkpointsdocs/eval_standard_v1.md— eval harness design, scoring rubrics, regression detectiondocs/agent_plane_design_v1.md— orchestrator, specialists, memory, trace capture- Updated
.kiro/specs/stella-catalog/design.md— aligned with v3 architecture - Updated
.kiro/specs/stella-catalog/tasks.md— re-sequenced for v3 phases
21. Summary
Architecture v3 takes the best from both predecessors:
- From v2: The three-layer model (Domain Core, Tool Layer, Agent Plane), tool-first design, explicit module template with tools/workflows/policies/evals, eval-first mindset, structured memory model, and AI-friendly development rules.
- From v1: TypeScript/NestJS stack, Prisma ORM, Zod validation, Constellation shared platform alignment, Supabase database/auth path, all 31 domain requirements, and the existing task breakdown.
Key decisions tightened after expert review:
- One durable execution model: Postgres outbox + DB-backed workflow state as the canonical default. BullMQ optional for throughput-heavy modules. No Inngest/Trigger.dev dependency.
- Agent runtime is a standalone process: Long-running Node.js process with persistent connections, in-process tool registry, and MCP server. Not serverless.
- Vercel scoped to UI hosting: Next.js admin on Vercel for CDN and developer experience. API, agents, and workers run as standalone containers.
packages/agent-sdk/stays thin: Types, interfaces, and helpers — not an orchestration framework. All orchestration logic lives inapps/agents/.
The result is a system that is genuinely AI-native at runtime (agents are first-class users with proper orchestration, memory, and evaluation) and AI-native in development (every module is predictable, explicit, and fits in a context window) — without sacrificing platform alignment, operational control, or full-stack coherence.
Appendix A: Runtime Primitives
This appendix defines the runtime contracts that Section 6.3 refers to. These are implementation-ready specifications — not guidelines. Platform packages (@constellation-platform/jobs, @constellation-platform/db) must conform to them.
A.1 Job Claiming and Retry Semantics
All background work flows through a single job_queue table in the application's PostgreSQL database.
Table schema
CREATE TABLE job_queue (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id uuid NOT NULL,
actor_id text, -- user/system that enqueued the job; nullable for system-initiated jobs
correlation_id text, -- propagated from request context; nullable for cron/sweep-initiated jobs
queue text NOT NULL, -- e.g. 'embeddings', 'ingest', 'enrichment'
payload jsonb NOT NULL,
status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','claimed','completed','failed','dead')),
run_at timestamptz NOT NULL DEFAULT now(),
claimed_at timestamptz,
claimed_by text, -- worker instance id
completed_at timestamptz,
attempt int NOT NULL DEFAULT 0,
max_attempts int NOT NULL DEFAULT 5,
last_error text,
idempotency_key text UNIQUE, -- optional; callers may set for dedup
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX idx_job_queue_poll ON job_queue (queue, status, run_at)
WHERE status = 'pending';
Claim protocol
Workers claim jobs with a single atomic statement. No advisory locks, no two-phase claim.
UPDATE job_queue
SET status = 'claimed',
claimed_at = now(),
claimed_by = $1, -- worker instance id
attempt = attempt + 1
WHERE id = (
SELECT id FROM job_queue
WHERE queue = $2
AND status = 'pending'
AND run_at <= now()
ORDER BY run_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *;
FOR UPDATE SKIP LOCKED ensures multiple workers never claim the same row. This is the only job-claiming mechanism in the system.
Retry semantics
| Behavior | Rule |
|---|---|
| Backoff | Exponential: run_at = now() + (2^attempt * base_interval). Default base_interval = 5 seconds. |
| Max attempts | Per-job max_attempts, default 5. |
| Dead-letter | After max_attempts exhausted, status moves to dead. Dead jobs are never auto-retried. |
| Stale claim recovery | A periodic sweep (every 60s) resets jobs stuck in claimed for longer than claim_timeout (default 5 minutes) back to pending. |
| Idempotency | If idempotency_key is set, a second INSERT with the same key is a no-op (ON CONFLICT DO NOTHING). |
Notification
After inserting a job, the writer issues:
NOTIFY job_queue, '<queue_name>';
Workers LISTEN job_queue and wake immediately. The poll loop (interval: 1s) is the fallback if a notification is missed.
BullMQ upgrade path
When a module opts into BullMQ (approved per-module, documented in the module's README.md), the @constellation-platform/jobs adapter routes that queue to Redis instead of Postgres. The JobQueue interface is identical — callers do not change. The job_queue table is not used for BullMQ-backed queues.
A.2 Workflow Row Schema and State Transitions
All durable workflows (agent tasks, ingest pipelines, enrichment runs, bulk operations) share a single workflow_runs table. Each workflow type defines its own Zod-validated state shape.
Table schema
CREATE TABLE workflow_runs (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id uuid NOT NULL,
actor_id text, -- user/agent that started the workflow; nullable for system-triggered workflows
correlation_id text, -- propagated from originating request; nullable for scheduled workflows
workflow_type text NOT NULL, -- e.g. 'ingest_pipeline', 'enrichment', 'agent_task'
status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','running','waiting_approval','completed','failed','cancelled')),
state jsonb NOT NULL DEFAULT '{}', -- Zod-validated per workflow_type
input jsonb NOT NULL, -- immutable; the original request
output jsonb, -- set on completion
error text, -- set on failure
started_at timestamptz,
completed_at timestamptz,
updated_at timestamptz NOT NULL DEFAULT now(),
created_at timestamptz NOT NULL DEFAULT now(),
parent_id uuid REFERENCES workflow_runs(id), -- for sub-workflows
trace_id text -- OpenTelemetry trace correlation
);
CREATE INDEX idx_workflow_runs_active ON workflow_runs (tenant_id, status)
WHERE status IN ('pending','running','waiting_approval');
State machine contract
Each workflow type must provide:
interface WorkflowDefinition<
TState extends z.ZodType,
TInput extends z.ZodType,
TOutput extends z.ZodType,
> {
type: string; // matches workflow_type column
stateSchema: TState; // Zod schema for the state column
inputSchema: TInput;
outputSchema: TOutput;
initialState: (input: z.infer<TInput>) => z.infer<TState>;
transitions: WorkflowTransition<TState>[]; // ordered list of named steps
}
interface WorkflowTransition<TState extends z.ZodType> {
name: string;
from: string[]; // allowed status values to enter this transition
execute: (state: z.infer<TState>, ctx: WorkflowContext) => Promise<TransitionResult<TState>>;
}
type TransitionResult<TState extends z.ZodType> =
| { action: 'continue'; state: z.infer<TState> }
| { action: 'wait_approval'; state: z.infer<TState>; approvalRequest: ApprovalRequest }
| { action: 'complete'; output: unknown }
| { action: 'fail'; error: string };
Status lifecycle
pending → running → completed
→ failed
→ waiting_approval → running (after approval granted)
→ cancelled (after approval denied or timeout)
Transitions are always single-row UPDATEs with an optimistic concurrency check:
UPDATE workflow_runs
SET status = $1,
state = $2,
updated_at = now()
WHERE id = $3
AND status = $4 -- expected current status
RETURNING *;
If the UPDATE returns zero rows, the transition is rejected (concurrent modification). The caller retries from a fresh read.
A.3 Raw SQL Policy
Prisma is the default data access layer. Raw SQL is allowed only for platform primitives where Prisma cannot express the operation correctly or efficiently.
Permitted raw SQL
| Use case | Reason | Owner |
|---|---|---|
Session-level RLS context (SET LOCAL app.tenant_id = $1) | Prisma has no session-variable API; must be raw parameterized SQL inside a transaction | @constellation-platform/db |
Job claim (FOR UPDATE SKIP LOCKED) | Prisma does not support SKIP LOCKED | @constellation-platform/jobs |
Workflow transition (optimistic UPDATE ... WHERE status = $expected) | Must be a single atomic statement, not read-then-write | @constellation-platform/jobs |
RLS policy setup (ALTER TABLE ... ENABLE ROW LEVEL SECURITY, CREATE POLICY) | DDL, not data access — migration-time only (see rule 4) | @constellation-platform/db |
Extension setup (CREATE EXTENSION, pg_cron schedule management) | Extension DDL — migration-time only (see rule 4) | @constellation-platform/db, @constellation-platform/jobs |
LISTEN / NOTIFY | Prisma does not expose Postgres channels | @constellation-platform/events |
Recursive ltree queries (@>, <@, lquery) | Prisma does not support ltree operators natively | modules/taxonomy/repository.ts |
Hybrid search ranking (ts_rank + pgvector distance in one query) | Prisma cannot compose full-text and vector scoring | modules/search/repository.ts |
Rules
- Raw SQL lives in the repository or platform infrastructure layer only (
modules/*/repository.tsor@constellation-platform/*) — never in services, tools, or workflows. - Every raw SQL call must be wrapped in a typed function with Zod-validated inputs and outputs.
- Raw SQL must be annotated with a comment referencing this appendix section:
// Raw SQL: see Architecture v3, Appendix A.3. - Migration files use
prisma migratefor schema changes. Exception: RLS policy DDL (ENABLE ROW LEVEL SECURITY,CREATE POLICY), extension DDL (CREATE EXTENSION), andpg_cronschedules cannot be expressed through Prisma's schema — these are the only raw SQL permitted in migration files, and each must reference this appendix. - If Prisma adds support for a currently-raw operation, the raw SQL must be replaced in the next cleanup cycle.
A.4 Composition Root
Both apps/api/ and apps/agents/ share a single composition root that wires all dependencies. This avoids duplicate initialization, inconsistent config, and drift between the two processes.
Structure
// packages/platform/runtime/composition-root.ts
export interface AppContext {
// Config
config: AppConfig; // validated with Zod at startup
// Database
prisma: PrismaClient; // single instance, RLS-aware
// Platform services
jobQueue: JobQueue; // Postgres-backed (or BullMQ per-queue override)
eventBus: EventBus; // domain events → outbox → LISTEN/NOTIFY
// Module registries
toolRegistry: ToolRegistry; // all module tools, keyed by name
workflowRegistry: WorkflowRegistry; // all workflow definitions, keyed by type
// Cross-cutting
tenantContext: TenantContextProvider; // extracts tenant_id from JWT / request
logger: Logger; // structured, OpenTelemetry-correlated
tracer: Tracer; // OpenTelemetry tracer
}
export function createAppContext(overrides?: Partial<AppContext>): Promise<AppContext>;
Wiring rules
createAppContext()is called exactly once per process — at the top ofapps/api/main.tsandapps/agents/main.ts.- Module registration is declarative. Each module exports a
register(ctx: AppContext)function that registers its tools, workflows, and event handlers. No module reaches into another module's internals. apps/api/callscreateAppContext()then boots the NestJS HTTP server. It registers all module tools (for the REST-facing tool endpoints) but does not start the workflow runner or job workers.apps/agents/callscreateAppContext()then starts the workflow runner, job workers, and MCP server. It registers the same tools (for agent use) and additionally starts the orchestrator and specialist agents.- Overrides for testing.
createAppContext({ prisma: testPrismaClient, jobQueue: inMemoryQueue })replaces real dependencies with test doubles. Every integration test uses this — no mocking of internal imports. - No NestJS module injection for platform primitives.
AppContextis a plain object, not a NestJS provider tree. NestJS controllers receiveAppContextvia a single provider binding. This keeps platform code framework-independent and usable byapps/agents/(which is not a NestJS app).
A.5 Tenant Context Propagation
tenant_id must survive across all async boundaries — HTTP requests, job execution, workflow transitions, and event handlers — without requiring callers to pass it manually through every function signature.
Mechanism
The platform uses Node.js AsyncLocalStorage as the canonical tenant context carrier.
// packages/platform/runtime/tenant-context.ts
import { AsyncLocalStorage } from 'node:async_hooks';
interface TenantContext {
tenantId: string;
actorId: string;
correlationId: string;
}
export const tenantStore = new AsyncLocalStorage<TenantContext>();
export function getCurrentTenant(): TenantContext {
const ctx = tenantStore.getStore();
if (!ctx)
throw new Error('Tenant context not set — are you outside a request/job/workflow scope?');
return ctx;
}
Context restoration rules
When restoring context from a persisted row, actorId and correlationId may be null (e.g. for cron-triggered jobs or system-initiated workflows). The restoration code must handle this:
tenantId— always present; read from the row'stenant_idcolumn. Required.actorId— read from the row'sactor_idcolumn if present; falls back to'system'if null.correlationId— read from the row'scorrelation_idcolumn if present; a new ID is generated if null.
| Entry point | How tenant context is set |
|---|---|
HTTP request (apps/api) | NestJS middleware extracts tenant_id, sub (actor), and correlation header from the JWT/request and enters tenantStore.run() before the controller executes. All three fields are available. |
Job claim (apps/agents) | After FOR UPDATE SKIP LOCKED returns a job row, the worker reads tenant_id, actor_id, and correlation_id from the row and enters tenantStore.run(). actor_id and correlation_id may be null (see fallback rules above). |
Workflow transition (apps/agents) | After loading the workflow_runs row, the runner reads tenant_id, actor_id, and correlation_id from the row and enters tenantStore.run(). Same fallback rules apply. |
| Event handler | The outbox consumer reads tenant_id, actor_id, and correlation_id from the event payload and enters tenantStore.run(). |
| MCP tool invocation | The MCP server reads tenant_id from the session/request metadata and enters tenantStore.run(). Actor is the agent identity; correlation ID comes from the session trace. |
Prisma integration
The Prisma client auto-attaches tenant_id to the database session for RLS enforcement. The implementation must satisfy two invariants:
- Parameterized SQL only — never interpolate
tenantIdinto a string. Use$executeRawwith tagged template literals (see A.3 whitelist). - Same session guarantee — the
SET LOCALand the subsequent query must execute within the same database transaction/session, otherwise the RLS variable is not visible to the query.
Note: The snippet below is conceptual pseudocode illustrating the required architecture constraints. The exact Prisma
$extends/ client-extension API may differ at implementation time. What matters is that the two invariants above are satisfied. The concrete implementation belongs inpackages/platform/db/and must be validated against the Prisma version in use.
// CONCEPTUAL — packages/platform/db/prisma-tenant.ts
// Validates against: Prisma client extensions API (verify exact signatures at implementation time)
export function createTenantAwarePrisma(basePrisma: PrismaClient): PrismaClient {
return basePrisma.$extends({
query: {
async $allOperations({ args, query, model, operation }) {
const { tenantId } = getCurrentTenant();
// Invariant 2: $transaction ensures SET LOCAL and query share the same PG session.
// SET LOCAL scopes the variable to the current transaction only —
// it is automatically reset when the transaction commits or rolls back.
return basePrisma.$transaction(async (tx) => {
// Invariant 1: tagged template — parameterized, not interpolated.
// Raw SQL: see Architecture v3, Appendix A.3 (session-level RLS context)
await tx.$executeRaw`SET LOCAL app.tenant_id = ${tenantId}::text`;
// Route the original query through tx, NOT through basePrisma,
// to guarantee it sees the SET LOCAL variable.
// (Exact dispatch mechanism depends on Prisma extension API version.)
return /* dispatch original operation through tx */;
});
},
},
});
}
Key constraints the implementation must honour (regardless of exact Prisma API shape):
$executeRawtagged template, not$executeRawUnsafe: Prisma's tagged template$executeRawparameterizes values automatically.$executeRawUnsafeaccepts a raw string and is subject to SQL injection if callers ever interpolate user input. The implementation must use the tagged-template form.- Query routed through transactional client: The query following
SET LOCALmust execute on the same transactional connection (tx), not through the base Prisma client. If Prisma's$extendscallback provides aquery()function that dispatches through the base client, it must not be used — find the equivalent that routes throughtx.
Invariant: runtime context vs serialized state
AsyncLocalStorage is the canonical in-process runtime carrier for tenant context. However, tenant_id also appears as persisted data in rows that cross process boundaries. These are distinct concerns:
Runtime context (in-process):
Within a running request, job handler, workflow transition, or tool execution, services read tenant context from AsyncLocalStorage — never from function parameters. This prevents:
- accidental tenant ID mismatch between caller and callee
- RLS bypass when a worker forgets to set the session variable
- proliferation of
tenantIdthrough every function signature
Serialized state (persisted):
Rows in job_queue, workflow_runs, outbox_events, and domain event envelopes must carry tenant_id (and optionally actor_id, correlation_id) as explicit columns. This is data, not runtime context — it is how context is reconstructed after serialization, process restart, or delayed execution. Boundary DTOs that cross process or network boundaries (e.g. webhook payloads, MCP session metadata) also carry tenant_id as data.
The rule: No in-process business logic (service method, tool handler, policy check, workflow transition function) should accept tenantId as a function parameter. These read from AsyncLocalStorage. Persistence layers, serialization boundaries, and entry-point bootstrapping code are the only places that read and write tenant_id as explicit data.