⚠️ BIG CALL OUT — READ FIRST
Multi-tenancy is foundational architecture, not a cosmetic feature.
It determines how all data, identity, and security boundaries behave in the platform. Getting it right early saves massive pain later.That said, implementing full multi-tenancy for an IPaaS + MCP stack is a major software endeavor. It involves DB design, secrets management, observability, runtime isolation, and governance.
👉 This should be planned in the roadmap, not rushed into an MVP.
Early versions can run on simpler single-tenant or light-scoping patterns until scale or enterprise customers demand more.
The goal here is to make sure we know where this fits in the journey and can avoid costly rewrites.
Multi-Tenancy Guide for IPaaS + MCP
This is the architectural playbook for standing up a multi-tenant IPaaS that exposes MCP tools/agents to each tenant. It’s aimed at a larger design discussion—tradeoffs, boundaries, and how we keep tenants safe without blocking velocity.
1) Tenancy Model (Who owns what?)
Tenants: companies/workspaces that connect systems (Stripe, HubSpot, Google Drive, etc.) and run automations and MCP agents.
Scopes we must isolate:
- Config: connectors, flows, agent/tool catalogs, environment variables
- Runtime: jobs, queues, schedules, executions, logs
- Data: secrets, credentials, cached artifacts, transcripts
- UI/Admin: users, roles, audit trails, billing
Decision: Default to logical multi-tenancy (shared control plane & data plane) with the option for siloed tenants (separate DB/queues/VPC) for enterprise.
2) Identity & Access
- Tenant boundary: Required on every request (host, header, or org subdomain).
- Users & roles:
Owner
,Admin
,Integrator
,Analyst
,Viewer
,ServiceAccount
. - Principals for automations: Service accounts with scoped tokens; avoid user tokens for long-running jobs.
- MCP agent trust: Agents/tools run with the tenant’s identity; permissions map to connectors and data classes.
3) Secrets & Connectors
- Secrets store: KMS-backed vault; per-tenant encryption keys or per-tenant data-encryption-keys (DEKs) wrapped by a master key.
- Connector instances:
connector_instance(id, tenantId, type, configRef, secretRefs, region)
—immutable history of edits. - Rotation: versioned secrets; roll forward without downtime; last-known-good tracked per instance.
- BYOK (Bring Your Own Keys): supported; we store references, not plaintext; decrypt only in connector runtime.
4) Configuration Layers (who can override?)
- Platform defaults (safe timeouts, retries, token lifetimes)
- Tenant policy (allowed connectors, data residency, PII redaction level)
- Workspace/project (flow timeouts, concurrency)
- Flow/run (one-off overrides, if permitted)
Rule of thumb: Lower levels may only narrow (not broaden) the upper-level policy unless a privileged role signs the change.
5) Data Isolation Options
- Shared DB + RLS: single cluster; Row-Level Security enforces
tenant_id
predicate in all tables. - Schema per tenant: middle ground; helpful for export/restore.
- DB per tenant (silo): for large/regulated customers; paired with dedicated queues and optional VPC peering.
Artifacts to isolate: job payloads, MCP transcripts, tool outputs, temporary files, vector indexes, cached tokens.
6) Runtime & Execution
- Queues: partition by
tenantId
and optionally bypriority
. - Concurrency: per-tenant limits + per-connector back-pressure (respect SaaS provider rate limits).
- Sandbox: job containers or workers with ephemeral filesystem and time/cpu/memory caps.
- MCP tools: run out-of-process; enforce capabilities manifest; deny unlisted operations by default.
7) Observability & Audit
- Logs/metrics/traces tagged with
tenantId
,connectorId
,flowId
,runId
. - Audit trail for: secret access, connector auth handshakes, policy changes, MCP tool use, data export.
- Retention tiers: configurable per tenant (e.g., 30/90/365 days).
- Tenant-visible traces: safe redactions, downloadable execution reports.
8) Safety & Governance
- Data classification: mark fields as
PII
,PHI
,Financial
,Public
. - Redaction at boundaries: strip/obfuscate sensitive fields in logs and LLM/MCP contexts based on classification + tenant policy.
- Egress controls: allowlist outbound domains per connector; deny arbitrary HTTP from agents.
- Human-in-the-loop (HITL): required approvals for destructive actions (delete records, send emails at scale).
9) MCP-Specific Considerations
- Tool registry per tenant: which tools are enabled; version pinning; deprecation policy.
- Context windows: limit what the agent can “see”—explicit grants to datasets or connector scopes.
- Determinism knobs: replayable prompts with input hashes; store minimal context necessary to reproduce.
- Prompt safety rails: inject system prompts that reflect tenant policies (PII handling, retention, citation rules).
10) Scheduling & Triggers
- Time-zone aware cron per tenant; daylight-saving safe.
- Event triggers: webhooks, file drops, CRM updates—tenant-scoped endpoints (unique secrets).
- Debounce & dedupe: idempotency keys per trigger to avoid storm repeats.
- Maintenance windows: tenant can pause schedules during blackout periods.
11) Performance & Cost Controls
- Rate limits: global + per-tenant + per-connector.
- Quotas: monthly run minutes, storage, data egress, MCP token budgets.
- Burst buckets: short-term spikes permitted within policy, then smooth.
- Cost visibility: tenant dashboards for usage; anomaly detection + alerts.
12) Regionality & Residency
- Data residency flag on tenant (US/EU/etc.).
- Connector routing honors residency (e.g., EU sheets ↔ EU workers ↔ EU storage).
- Cross-region copying requires explicit policy allow and audit trail.
13) Lifecycle: Onboarding → Offboarding
- Onboarding: invite users, bind domain(s), set policies, connect systems, create service accounts.
- Backup/export: tenant-scoped export (configs, flows, secrets references, not plaintext).
- Offboarding: disable schedules, revoke webhooks/OAuth, delete artifacts per retention policy, cryptographic erasure of DEKs.
14) Billing & Plans (even if simple at first)
- Plan sets limits (connectors, runs/month, storage, retention).
- Per-tenant overage rules (soft cap vs. hard stop).
- Enterprise toggles: siloed DB, dedicated queue, private networking, custom retention.
15) Change Management & Versioning
- Version flows & tools; support phased rollouts and rollbacks per tenant.
- Breaking changes require migration notes and tenant-level confirm gates.
- Config diffs and approvals for risky changes (e.g., widening data access).
16) Security Posture
- Principle of least privilege across everything.
- Zero trust between services; mTLS + short-lived service tokens.
- Secret access audited; decrypt only in memory, never log plaintext.
- Pen tests per release train; targeted tests for MCP tool escapes.
17) Enterprise “Silo” Checklist
When a tenant qualifies for isolation:
- Dedicated DB/schema and read replica for analytics
- Dedicated queues & worker pool with reserved capacity
- Optional VPC peering / private link to tenant systems
- Custom keys (per-tenant KMS CMK) and bespoke retention
- Contractual SLAs & incident runbooks specific to tenant
18) What “Good” Looks Like
- New tenant can self-serve to first automation in under 30 minutes.
- Every execution is traceable, replayable, and scoped to the right tenant.
- A noisy or failing tenant cannot degrade others (isolation works).
- Secrets are rotated, audited, and never leak in logs.
- MCP agents are useful but boxed in by explicit capabilities and policies.
Open Questions for the Room
- Which tenants need silo mode at launch vs. later?
- How strict do we make the default egress allowlist for MCP tools?
- What’s our minimum viable observability (logs/traces) that tenants can see without risking PII exposure?
- Do we start with row-level security or schema-per-tenant for the shared tier?
This guide is intentionally policy-first. When we’re aligned on the boundaries, we can layer in concrete implementations (DB choice, queue tech, MCP runtime, vault provider) with confidence.