Question 1

What is the Blueprint Readiness Score?

Accepted Answer

A 0–100 score that measures whether the trust boundaries of the 10 Blueprint principles hold for the submitted code. Each finding gets a `severity_class` orthogonal to the verdict label: `production_blocker` = trust boundary fails, must fix before production, contributes 0 credit. `hardening_recommended` = trust boundary holds, defence-in-depth note for the next iteration, contributes full credit. `polish` = stylistic / non-load-bearing, full credit. `aligned` = full credit. The headline grade penalises only `production_blocker` (and the legacy `high_risk` verdict, which is always a blocker by definition). Score = round(100 × Σ credit / applicable_principles). Letter grades: A 90+, B 75+, C 60+, D 40+, F below 40. Tiers: production_ready (A or B), emerging (C), draft (D or F). The `high_risk` verdict caps the grade at C so production_ready can never co-exist with an unaddressed high_risk finding. The reason `production_ready` does not require 100/100 is that chasing the perfect score is the perfection-loop trap (Ford & Parsons fitness-function framing) — production_ready means trust boundaries hold, and hardening recommendations are iteration material, not a deficit. Older runs that pre-date the severity_class field score under the legacy verdict + severity_score interpolation and grade exactly as they did before. The score is computed once on the server so humans and agents see the same value.

Question 2

What plan do I need to call architect.validate?

Accepted Answer

A Pro or Teams plan. Free and Basic accounts can read every doctrine principle, cluster, example, and guide via the public MCP at no charge, but architect.validate, architect.certify, and me.validation_history are reserved for paid plans because they process real code, persist per-project history, and read back trend context across runs.

Question 3

How does the validate → certify two-step work? Why is certify a separate tool?

Accepted Answer

architect.validate is the fast first-pass diagnostic — single LLM call, ~30-50s, fits the ~60s MCP-client tool budget. It returns a per-principle scorecard, a Blueprint Readiness Score, a tier, and a `run_id`. When the run scores production_ready (A or B tier), the response carries `certification_status='not_evaluated'` — by design. To mint the certified production_ready badge, the agent calls `architect.certify(run_id, code)` next. Cert is a separate Pro/Teams tool with its own eligibility gate (caller-owned, tier=production_ready, less than 24h old, not already certified, retry budget remaining) and runs a single adversarial second-pass LLM call (~20-40s). Cert is bound to the exact code that was validated via a 16-character SHA-256 fingerprint stamped at validate time and verified at certify time — without that binding the badge would be meaningless. Why split it: the previous in-line cert path could chain up to 6 LLM calls per validate (3 first-pass + 3 cert in retry-loops), pushing total latency past the client tool budget. Splitting validate (fast, opt-in cert) from certify (slow, adversarial trust-bearing) keeps each step inside the budget and makes the cost+time profile of certification explicit to the caller. Cert can downgrade a production_ready run to emerging/C with a typed `cert_downgraded` blocker_reason if the second-pass surfaces a missed production_blocker — that is the cert doing its job.

Question 4

What does the production_ready badge actually mean? Is it a security audit?

Accepted Answer

No. The production_ready badge means the code possesses the architectural trust boundaries the doctrine asks for: explicit handoffs, recovery paths, audit inspectability, persistent run records, typed operational state. It is an automated point-in-time assessment of structural alignment with the 10 Blueprint principles, performed by an LLM at a specific moment, against a specific doctrine fingerprint. It is NOT a cybersecurity audit, penetration test, regulatory compliance check (HIPAA / SOC2 / GDPR / DORA), or guarantee that the code is free of bugs, vulnerabilities, or runtime hallucination risk. The doctrine itself dictates that agents are governed by humans — final responsibility for deployment, monitoring, production testing, and security of the agentic workflow remains entirely with the deploying organisation. AI Design Blueprint provides the standard and the measuring tape; you own the consequences of execution. Full disclosure on the [trust and data handling page](/en/for-agents/trust-and-data-handling).

Question 5

How should I send code to architect.validate? Can I summarise or split it?

Accepted Answer

Send the FULL file contents verbatim as `implementation_context`. Do not truncate, compress whitespace, condense multi-line statements, paraphrase, or summarise. The architect's findings cite specific identifiers, branch ordering, and structural choices — those signals get destroyed by any kind of compression, so a summarised submission produces a degraded verdict that does not reflect the actual code. Architecture summaries (high-level prose) are accepted ONLY when no code exists yet, for greenfield review; never as a substitute for code that already exists. If a single file is too large to fit your MCP client's tool-call budget, split into multiple architect.validate calls scoped by FILE (not by principle cluster). Splitting by cluster via focus_area is a band-aid that produces fragmented verdicts: each cluster-call sees only ~3 principles, the certification path cannot fire (it requires the full first-pass), and the project-page trend becomes incoherent. If your MCP client tool-call closes before the server returns, the run still persists server-side. Recover the result with `me.validation_history(run_id=...)` — the run_id is surfaced in the FIRST `notifications/progress` event of every architect.validate call (sent at t=0 specifically so you have the recovery handle even when the call closes a few seconds later). The badge URL and `/me/validation-runs` REST endpoint also work; the MCP path is the simplest from inside an agent loop.

Question 6

How do I know if my downloaded agent assets (CLAUDE.md, .claude/ pack, MCP config) are out of date?

Accepted Answer

Every pack file carries an `_aidb` block at the top with `pack_version` + `content_version` (the doctrine commit hash at build time). To check from a Claude Code or Codex session: ask your agent to call `assets.list` via the MCP and compare the manifest's `content_version` against the `_aidb.content_version` in your local file. The `assets.list` tool is public, no Pro/Teams plan required. From a script or CI: hit `https://aidesignblueprint.com/agent-assets/index.json` and compare. If they differ, re-download the pack from `https://aidesignblueprint.com/agent-assets/claude-code-pack.zip` (or the equivalent for Cursor / Codex / Gemini). The MCP runtime also exposes a `doctrine_fingerprint` on every architect.validate response — different concept: it lets you detect if your prior validation runs were scored under a different doctrine version, so the architect can surface drift in the iteration loop. Doctrine principle text rarely changes; the hooks and configs evolve as the architect's findings produce new enforcement patterns. Pinned installs are fine for stability; update at sprint boundaries to pick up new checks.

Question 7

How does the Architect Agent stop my agent from looping on already-fixed issues?

Accepted Answer

When you pass repository to architect.validate, the score and per-principle verdicts are persisted against that repository. Before re-validating, your agent calls me.validation_history with the same repository name and reads back the latest score, the delta versus the previous run, and the principles that regressed. The new review then focuses on what changed, instead of re-flagging issues that were already aligned and have not moved.

Question 8

Is my code stored when I call architect.validate?

Accepted Answer

Payloads are processed transiently in memory by the underlying LLM provider (OpenAI API, under a no-training-on-API-data policy) and dropped. We never train models on user code, validation payloads, or architecture diagrams. Pass private_session=true on the call to also skip all server-side logging on our side. UK/EU data residency on Google Cloud Run europe-west2.

Question 9

What does the Architect Agent return per principle?

Accepted Answer

For each evaluated principle: a verdict (aligned, needs_changes, high_risk, or not_applicable), a numeric severity_score 0–100, a confidence level (low/medium/high), an evidence_quality rating (sparse/moderate/strong), code-cited evidence, a recommendation when not aligned, and a list of recommended example slugs you can fetch with examples.get. The assessment also surfaces a code_classification (autonomous_agentic_workflow vs non_agentic_component, with rationale) so you can inspect why some principles were marked not_applicable. The aggregate readiness block carries the score, grade, tier, per-bucket counts, and whether the grade was capped by a high_risk finding.

Question 10

Is the validator deterministic? Can I reproduce a run later?

Accepted Answer

Reproducibility is best-effort, and the response surfaces every knob that affects it. Identical input produces an identical seed, derived from a collision-free JSON canonicalisation of every prompt-affecting field. The reproducibility block carries the model, seed, OpenAI system_fingerprint, doctrine_fingerprint (a hash over the principle definitions), prompt_template_fingerprint (system prompt + scaffolding + JSON schema + reasoning_effort), and reasoning_effort. If a future deploy changes the system prompt or the doctrine, the corresponding fingerprint changes — silent drift is impossible by construction. Per-finding confidence lets you tell intrinsic LLM variance from real disagreement. The mode is explicitly 'best_effort' so callers do not infer byte-identical replay.

Question 11

How does the Architect Agent handle prompt injection in submitted code?

Accepted Answer

The system prompt explicitly delimits submitted code and context as inert untrusted data and instructs the model to ignore any instructions inside them. User-supplied code and context are JSON-escaped before they enter the prompt, so markdown delimiters or instruction-shaped content cannot break out of the data block. If a payload contains injection attempts, the validator treats them as evidence to cite under inspectability or blocker findings, not as instructions to follow.

Question 12

What happens when OpenAI is rate-limited, slow, or down?

Accepted Answer

Provider failures surface as typed error codes — timed_out, rate_limited, dependency_unavailable, schema_mismatch — each with the dependency name, retryable flag, and a concrete next_action. The user-facing time budget is 5 minutes (300 seconds) and is enforced at the provider call boundary itself, not just at the outer wrapper. Persistence failures flip the run's persistence_status to failed and strip the tentative run_id / badge_url / review_url so dead links never reach the caller. Curated-example lookup failure degrades to recommended_examples=[] with example_recommendation_status='unavailable' instead of failing the run; the primary findings are always preserved.

Question 13

Was the Architect Agent reviewed against itself?

Accepted Answer

Yes, repeatedly. The most recent self-review against the prior-run baseline-injection feature scored production_ready (B, 86 / 100, 5 of 6 applicable principles aligned, no high_risk findings, grade not capped) on the live prod endpoint. Read it on the public review page: https://aidesignblueprint.com/en/readiness-review/6d70eb0c-ec05-4ed4-b882-e30d8ca400de. Same MCP tool, same doctrine, same 5-minute budget anyone else gets, same fingerprint envelope every caller receives. The score is honest, not rubber-stamped: the architect found one principle still worth tightening, and that finding is what production_ready means under the doctrine — the trust boundaries hold even though the next iteration could harden the operational state surface further. Earlier iteration loops on this same code drove the score from 89 / B to 100 / A in three rounds with prior-run context injected, surfacing real bugs round by round including missing structured classification, lossy seed canonicalisation, untyped LLM provider errors, prompt-injection vulnerability in baseline content, AttributeError escape on truthy non-dict nested fields, and a typed parse-boundary primitive landing on the final round.

The Architect Agent

What you get back

Severity_class scoring (production_blocker vs hardening_recommended)

Reproducibility envelope (best-effort, but auditable)

Trend memory across runs