Multi-Channel GP Research as Fiduciary Practice

Abstract

This paper sets out a framework for what fiduciary-defensible AI-augmented GP research looks like, and the architectural disciplines an institutional LP should require from any platform that participates in the manager-selection workflow. We argue that GP research is not a content-production task that AI happens to accelerate; it is a fiduciary act that produces process evidence, and the AI architecture underneath the research workflow either survives external review or it does not.

The paper is deliberately scoped as a framework rather than an implementation report. The disciplines described — multi-channel sourcing across six evidentiary classes, source taxonomy organized for fiduciary review, multi-dimensional confidence tiers, multi-layered grounding, and audit-trail-grade decision history — are achievable today only at the architectural level. Even the most ambitious platforms (PrivateMetrics included) are mid-implementation against the full standard. The closing section is candid about that gap and frames procurement as an architectural-review exercise, not a feature-checklist exercise.

The audience for this paper is primarily the investment-team side of an institutional LP — CIOs, investment-committee members, fiduciary counsel, and consultants who advise institutions on AI-augmented research technology. The framework also matters to the operations side: AI-augmented research outputs become operational evidence in the institution's audit trail once a decision has been made, and the source-tagging, confidence-tier, and grounding disciplines this paper specifies are what make that operational evidence defensible months or years later. The paper assumes familiarity with the LP fund-selection workflow but does not assume technical familiarity with retrieval-augmented generation or related AI infrastructure concepts.

1. Why GP Research Is a Fiduciary Act

A fund manager-selection decision — to commit, to defer, to pass — is not a private judgment call. It is a documented act that produces evidence. The evidence is examined, sometimes years later, by people who were not in the room when the decision was made. Those reviewers apply a specific test: could a prudent expert, presented with this evidence, conclude that the fiduciary followed a defensible process?

The test is about process, not outcome. Funds underperform. Managers disappoint. Vintages produce losses. A fiduciary who selects a fund that ends up in the fourth quartile does not automatically face liability. A fiduciary who selects a fund through an indefensible process does, regardless of how the fund eventually performs. The standard derives from ERISA's prudent-expert duty (29 CFR §2550.404a-1) and analogous standards in non-ERISA fiduciary contexts (the Uniform Prudent Investor Act for endowments and trusts, the OECD Principles of Corporate Governance for sovereign wealth funds, and analogous statutes for public pensions).

This matters for AI because AI is now part of the process evidence. A decision informed by AI output is a decision informed by a specific AI output, produced by a specific system, under specific conditions, with specific inputs. If any of those specifics cannot be reconstructed later, the evidence of process breaks down. If the AI cannot explain why it concluded what it concluded — with architecture that makes the explanation mechanical rather than rhetorical — the fiduciary cannot explain why the decision was reached.

This is the question every LP technology evaluation should be asking, and the question most are not. The common evaluation question is "can the AI do this?" Today the answer is increasingly yes — frontier systems can read a private placement memorandum, cross-reference it against public filings, draft a first-pass investment-committee memo, and highlight the consequential terms in a limited partnership agreement. Capability is not the differentiator. The right question for a fiduciary is: can the AI do this in a way that survives external review?

The remainder of this paper specifies what survival of external review requires.

2. The Six-Channel Framework

The first architectural discipline is multi-channel sourcing. Single-channel research — relying primarily on materials the GP itself supplies — produces an evidentiary monoculture that fiduciary review will identify as a weakness. The GP's own materials are essential evidence; they are not sufficient evidence. A defensible research process gathers from multiple independent channels and synthesizes across them, surfacing both convergence (when channels agree) and divergence (when they disagree) as themselves consequential signals.

The six channels institutional LP research should cover:

Channel	Representative inputs	What it adds
GP relationships	LPA, side letters, PPM, GP-prepared track record, reference calls with the GP team	Authoritative on fund mechanics; the GP's framing of its own strategy
Regulatory filings	SEC Form ADV / Form D (US), AIFMD disclosures (EU), Cayman MLI filings, equivalent regulatory disclosures by jurisdiction	Independent verification of GP-supplied claims about firm structure, AUM, regulatory history
Web and market intelligence	Industry publications, transaction databases, news flow, conference proceedings	Surfaces market context, deal activity, reputational signals the GP would not be expected to surface itself
Peer LP activity	Anonymized intelligence on what comparable institutions have committed to, declined, or are evaluating; consortium-shared diligence outputs	Provides triangulation on whether sophisticated peers reach similar conclusions
Analytical third parties	Cambridge Associates, StepStone, Hamilton Lane benchmarking; rating agency reports; specialist consultants	Authoritative on benchmarking and quartile placement; provides framing the GP cannot self-supply
Internal mandate and policy	The institution's own investment policy statement, existing portfolio composition, mandate constraints	Determines fit before evaluation begins; surfaces concentration and policy-band considerations

A defensible synthesis weighs each channel against the others. Convergence across channels strengthens a finding; divergence is itself a finding. A GP claim about fund AUM that the SEC Form ADV corroborates is a stronger fact than the GP claim alone. A GP claim about a strategic shift that web/market intelligence does not corroborate — and that no peer LP appears to be reacting to — is a weaker claim than the GP's narrative would suggest.

The architectural requirement is that each channel's contribution to a finding remains traceable. A platform that aggregates across channels and produces a single composite output, without preserving which channels contributed which evidence, fails the multi-channel test even if its inputs are diverse. The fiduciary needs to be able to answer "which channels supported this conclusion, and which did not?" — and that requires the channel-level structure to survive the synthesis step.

Implementation note: the platform's sourcing research workspace operates against six canonical channels (GP relationships, regulatory filings, web, market intelligence, peer-LP activity, economic fundamentals) at the render layer, with each finding tagged to its source channel and channel attribution preserved through synthesis. The render layer now also carries per-claim citation chips on three LLM-generated surfaces — recommendation rationale, market assessment, and executive summary — where every grounded statement links to a canonical source record with hover tooltip and click-through to the underlying filing or document. At the substrate layer, the platform's public-research substrate decomposes the six channels into a finer-grained 28-source coverage matrix organized across five evidentiary layers (regulatory, public-roster, market-context, LP-internal, commercial-overlay) with explicit per-source verification tier; the substrate persists evidence at this finer granularity, and extending substrate-derived per-source citations across all sourcing surfaces (channel summaries beyond the three LLM-prose surfaces) is the next phase of work. Implementation gaps relative to the framework — cross-channel correlation depth, channel-coverage completeness signal, and substrate-to-render wiring for non-LLM surfaces — are on the development roadmap. The framework here is the target the substrate is being built toward; the gap continues to close.

3. Source Taxonomy: From Tags to Evidentiary Classes

The second discipline is source taxonomy. Most AI-augmented research tools that claim "source attribution" do one of two things: they cite the source document a passage was retrieved from, or they tag passages with a category label like [SEC] or [WEB]. Both are improvements over unsourced prose. Neither is sufficient for fiduciary use.

The architectural requirement is that the source taxonomy be organized around evidentiary classes that fiduciary reviewers actually apply. A fiduciary review treats different categories of evidence differently. Primary source documents (a fund's LPA, a GP's audited financials) carry different weight than secondary reporting (an industry publication summarizing the same LPA). Regulatory filings (SEC Form ADV, AIFMD disclosures, Cayman MLI filings) are primary, but not all regulators carry equal authority for all jurisdictions. Consultant research (Cambridge Associates, Hamilton Lane benchmarking) is authoritative within its scope but not dispositive. Peer LP intelligence is probative but informally sourced.

A taxonomy organized for fiduciary review has approximately 8–12 categories, grouped into trust classes:

Trust class	Representative source types
Primary documents	Fund LPAs, GP audited financials, GP-prepared PPMs, side letters
Regulatory filings	SEC Form ADV / Form D (US), AIFMD disclosures (EU), Cayman MLI filings, other sovereign regulators
Analytical third parties	Consultant research, benchmarking databases, rating agency reports
Peer intelligence	Anonymized LP reports, consortium data, investment-consortium research
Public commentary	Industry publications, general web content, social commentary
Internal LP records	The institution's own mandate, policy, prior-research archive

A simpler tag system collapses this structure. A five-tag scheme like [SEC] [LEI] [DB] [MANDATE] [WEB] treats SEC filings and a random web article as parallel; it hides the distinction between the institution's own mandate (internal) and a passage from a third-party consultant report (external authoritative) inside catch-all categories. The remedy is not to rename the tags. It is to structure the taxonomy around evidentiary classes that reviewers will actually apply.

Within each tag, the platform must also distinguish citation types. A claim in AI-produced output may be a direct quote from a single source, a synthesized extraction across multiple passages of a single source, or an inference that combines evidence from multiple sources. These have materially different evidentiary weight. A fiduciary reviewer presented with "$2.1 billion in AUM [SEC]" has reason to examine whether the SEC filing actually says that. A reviewer presented with "track record suggests strong origination capability [GP] [WEB] [PEER]" has reason to examine the inference chain, not just the sources.

The third element is verifiability, not just visibility. A clickable tag that opens a source document is table stakes. Fiduciary-grade implementation supports the verification a reviewer actually performs:

Passage highlighting: the specific text in the source that the claim was extracted from, not just the document
Version pinning: the exact version of the source document the claim was generated against (source documents change; the AI's output was produced against a specific version)
Retrieval metadata: when the source was ingested, how it was retrieved, what transformation was applied
Paraphrase vs. quote distinction: was this a direct quote, a paraphrase, or a multi-step inference?

These are non-trivial engineering requirements. They imply that the platform maintains a source corpus with versioning, that retrieval produces structured artifacts with passage references, that prose generation is constrained to reference those artifacts, and that the presentation layer can reconstruct the traceability tree for any claim a reviewer challenges.

Implementation note: At the render layer, the platform today uses a five-tag baseline (regulatory filing, legal-entity identifier, internal database, mandate, web) on facts surfaced by its sourcing research workspace, AND per-claim citation chips on three LLM-generated surfaces (recommendation rationale, market assessment, executive summary) where every grounded statement carries a numbered superscript chip linking to its canonical source record (hover tooltip shows source type + label + as-of date + URL; click navigates to the underlying filing or document). The chip-rendering layer is the first half of the substrate-to-render gap closing: rendered prose now visibly traces every material claim to a real source. At the substrate layer, the public-research substrate ships a structured 28-source × 5-tier coverage vocabulary keyed by 66 fact-type atoms across seven categories (firm identity, fund disclosure, team, strategy, performance, ESG/governance, market context, LP-internal) — 219 fact-source edges in total — which is the architectural mid-step toward the 8–12-category trust-class taxonomy this paper describes. The remaining work to fully realize the framework: extending citation-chip rendering beyond the three LLM-prose surfaces (to channel summaries and similar non-LLM surfaces); replacing the render-layer five-tag display with substrate-derived 28-source attribution at the user-visible level; citation-type differentiation, version pinning, and passage highlighting. The current state is honest: render-layer per-claim citation chips ship for the highest-value surfaces (rationale, market assessment, executive summary); the substrate evidence layer persists per-finding source attribution; extending substrate-derived attribution to the remaining surfaces is the next phase.

4. Confidence Tiers as Interrogable Claims

The third discipline is confidence representation. A common pattern is to attach a single tier label — Strong, Moderate, Preliminary — to each AI-produced claim or recommendation. This is acceptable for display, but only if the underlying structure preserves the dimensions that compose to that tier. A single-tier label without dimensional decomposition is decorative, not structural; a fiduciary reviewer asking "why Moderate and not Strong?" should receive a structured answer, not an opaque score.

At minimum, a fiduciary-grade confidence model tracks four dimensions:

Dimension	Question it answers
Source quality	How authoritative are the sources the claim is based on?
Source agreement	Do the sources corroborate the claim, or do some sources disagree?
Source completeness	Did the platform retrieve all the sources that would be relevant to this claim, or was retrieval partial?
Synthesis distance	Is the claim a direct citation (distance = 0), a single-step extraction (distance = 1), or a multi-step inference (distance > 1)?

A claim can score high on source quality but low on source agreement (one authoritative source says one thing, another says something different — supportable but not uncontroverted). It can score high on quality and agreement but low on completeness (the sources agree, but the platform didn't retrieve everything it could have). It can score high on completeness but high on synthesis distance (everything was retrieved, but combining the pieces required inference that introduces its own risk).

These four dimensions compose to a five-tier display label:

Tier	Criteria
Strong	Direct citation of primary or regulatory source; no source disagreement; completeness verified
Moderate	Extraction from primary or regulatory source with consistent evidence; completeness reasonable
Preliminary	Extraction from mixed-quality sources, or inference across multiple sources with some synthesis distance
Exploratory	Inference from thin evidence; the AI is reasoning ahead of confident support
Not Supported	Claim cannot be supported by available evidence — should not be emitted at all

The fifth tier, Not Supported, is the most important. It is not a display state. It is a generation-time constraint. Claims that the platform cannot support at least to the Exploratory level should be structurally prevented from appearing in the output at all. The AI does not get to say something the platform cannot source-justify. This is the architectural commitment that distinguishes a research platform from a content-generation platform: in the latter, the model produces output and the user filters it; in the former, the model is constrained at generation to produce only what the source corpus supports.

The other property the architecture must support is interrogability. A reviewer who asks "why is this claim Moderate rather than Strong?" should receive a structured answer:

"This claim is Moderate because source quality is Primary, source agreement is full, but source completeness is Partial — we retrieved this GP's public filings but not the confidential LPA, which would be the authoritative source for this specific term."

That is what an interrogable confidence tier looks like. The tier is the visible label; the dimension scores are visible on inspection; the dimension scores explain why the tier is what it is. A platform that produces only the tier label, with no underlying dimensional structure to interrogate, fails this property — even if its tier label is correctly calibrated.

A related property is user annotation as provenance. Fiduciaries often know things the AI does not. An LP that has a personal relationship with the GP may have high confidence in a claim the AI labeled Preliminary, because the LP verified it independently — by phone call, by reference check, by personal visit. The right architecture supports this: the user can annotate the claim as "verified by [role] on [date] via [mechanism]" and that annotation is itself tracked as provenance, stored alongside the AI's computed confidence as a separate signal in the decision record. The AI's confidence and the user's verification are not in tension; they are both inputs to the fiduciary's deliberation, and both belong in the audit trail.

Implementation note (refreshed June 2026): At the render layer, the platform today computes a per-recommendation confidence tier from a numeric threshold against the four-tier scale described above (Strong / Moderate / Preliminary / Exploratory) AND ships per-claim source attribution chips on three LLM-generated surfaces (recommendation rationale, market assessment, executive summary) — every grounded statement is visibly traced to its underlying Source record, with the source's verification tier and substrate layer surfaced in the chip tooltip. At the substrate layer, the platform now persists all four confidence axes per finding: source-quality (a 5-tier vocabulary — Tier 1 Regulatory, Tier 2 Verified, Tier 3 Single-Source, Tier 4 Conflicted, Tier 5 Unverified-never-emitted), source-agreement (a corroborating-sources JSON list + a source-conflict boolean), source-completeness (the fraction of the per-fact source matrix actually retrieved for the claim), and synthesis-distance (a tag-emission heuristic measuring inference depth from source-evidence to claim-prose). A deterministic aggregation function combines the four axes into a five-tier confidence output (Strong / Moderate / Preliminary / Exploratory / Not-Supported) via a documented decision matrix with explicit threshold constants — auditable rather than opaque. The Not-Supported generation-time constraint now ships at the generation level: any claim whose axes aggregate to Not-Supported is dropped before render, and a structured synthesis error surfaces on the session banner so the suppression is visible rather than silent. User annotation as provenance also ships as substrate: an analyst can record a verification verdict (verified / disputed / contextualized) against any claim, with optional mechanism (phone call, reference check, internal knowledge, written confirmation) and free-form notes, persisted as an append-only audit chronology — the verification surface is shipped end-to-end on both server and client; the tooltip integration that makes the annotation panel visible directly inside the per-claim citation chip is the remaining user-facing step. The remaining honesty in this note is operational rather than architectural: live verification on real sourcing runs surfaced that two upstream attribution signals (per-channel fact-type and canonical-tag attribution) do not yet write canonical values at runtime, which makes the completeness and synthesis-distance axes uniformly populate at neutral substitution values until those upstream gaps close in a follow-on phase. The framework the paper describes is what the platform is building toward; the substrate has now reached the full four-axis-plus-annotation shape; the visible render and the calibration against real data are the remaining gaps, and both are in active development.

5. Grounding as Architecture, Not Afterthought

The fourth discipline is grounding — the property of AI output being traceable to supporting sources, with no claims introduced that the sources do not support. The common implementation pattern is a post-generation check: after the model produces output, a scanner extracts claims (typically numerical) and verifies each against the input corpus. Claims that don't match get flagged or removed.

This is a reasonable catch-net. It is not sufficient architecture.

The problem with post-generation grounding alone is twofold. First, it can only catch claims that match a pattern — typically dollar amounts, percentages, dates. It cannot catch a fabricated GP name that happens to sound plausible, a misattributed strategy label, an invented deal count, a hallucinated quote from a hypothetical LPA, or an inference that combines real facts in a way the real facts don't actually support. The check addresses one category of failure (numerical fabrication) and misses most others.

Second, post-generation grounding operates after the model has already produced the output. It is a filter on the output, not a constraint on the generation. A model that is not constrained during generation to produce grounded output will continue to fabricate; the post-generation check just improves the statistics. The architectural commitment is to constrain generation, not just to scan output.

Fiduciary-grade grounding operates at three layers:

Layer	Mechanism	What it prevents
Retrieval	RAG (retrieval-augmented generation) constrains the model's access to a specific, indexed corpus. The model generates within a source-bounded context, not from open knowledge.	Model reaches for knowledge from training data that may be wrong, stale, or nonexistent
Extraction	Numerical and categorical claims are extracted from source documents as structured data before prose is generated. Prose references the structured extractions by ID.	Model generates new numbers, names, or categories in the prose that don't match what the sources actually said
Verification	Post-generation check compares every extractable claim in the output (numerical, named entity, temporal, categorical) against the source corpus. Unmatched claims are flagged, blocked, or regenerated.	Residual fabrication that escapes the retrieval and extraction layers

Each layer catches a different failure class. Retrieval prevents the model from reaching outside the corpus. Extraction prevents the prose-generation step from inventing structured values. Verification is the backstop that catches whatever slipped through. The architecture compounds defense; any single layer alone is inadequate.

Detection is necessary but not sufficient. The user-facing treatment of detected fabrication has to match severity:

Severity	Example	Treatment
Low (drift in a non-load-bearing claim)	A fund's vintage year reported as 2022 when sources indicate 2021	Surface as warning in UI; user decides whether to correct
Medium (material but contained)	A fund's AUM reported as $2.1B when sources show $1.9B	Block output until user provides source or removes claim
High (structural fabrication)	A claim about a nonexistent fund, or a quote from a nonexistent LPA	Discard output; regenerate with tighter constraints; log the event

Silent logging — the weakest treatment — is inadequate. A fabrication that reaches the user without any signal is a fabrication the user will treat as a fact. Surfacing the warning, at minimum, shifts the question from "did the platform catch it?" (the platform did) to "how did the user respond?" — which is itself an auditable decision.

Implementation note: At the render layer, the platform today operates the verification layer with a rule-based scanner for fabricated dollar amounts in synthesis output. At the substrate layer, the public-research substrate recently shipped a content-addressable artifact store: every external source the platform queries can be persisted as a SHA-256-keyed snapshot — Form D primary documents, GP website renders, trade-press article texts, regulatory filings — with idempotent dedup, retrieval audit (first-seen / last-seen / count), and forward-compatible inline-or-external storage so the artifact substrate is ready for blob-storage migration when scale demands it. This is the structural foundation for grounding-as-architecture rather than grounding-as-post-hoc-check: the substrate now persists the corpus that the formal retrieval layer will read from, and the artifact-keyed audit trail that the multi-layered verification layer will reference. The retrieval layer remains prompt-bounded; structured-claim extraction prior to prose generation is in design; severity-graded surfacing is uniform "warning" today. Writing artifacts at sourcing-run time and the formal retrieval layer that reads from them are the next phases. Formal retrieval, structured extraction, comprehensive verification scope, and severity-graded surfacing continue to track on the development roadmap. The current grounding posture is honest about its limits: a partial first layer of defense at render, with the multi-layered architecture's substrate now in place and the run-time wiring being the next phase.

6. Audit-Trail Discipline

The fifth discipline is audit trail. A fiduciary research process is iterative — a fiduciary reads initial findings, has questions, drills into specific areas, asks follow-up questions, updates the analysis, eventually reaches a decision. A platform optimized for "AI produces a memo, human approves" supports a different workflow than one optimized for "AI supports iterative analysis, human reaches a decision." The latter is what fiduciary practice actually looks like.

The architectural implications of iterative collaboration:

Conversation history as audit trail. Every question the fiduciary asked, every response the AI produced, every drill-down request is part of the process record. Not stored in transient chat logs that expire; stored as part of the decision's durable trail.
Question attribution. Who asked the follow-up question? The analyst? The portfolio manager? The IC chair? The asker matters for audit purposes as much as the question itself.
State progression. The analysis evolves over time. A view the fiduciary saw on April 1 is different from the view on April 15 after additional source material was ingested. The audit trail must be able to reproduce the view as of any point during the deliberation.
Decision reasoning as first-class. The fiduciary's reasoning for the final decision — not just "approved" but "approved because X and Y, despite concern Z" — is the final entry in the audit trail, and is structurally required, not optional.
Override events as first-class. When a fiduciary overrides an AI-produced disposition (rejects a recommendation, accepts a fund the AI flagged, modifies a confidence tier), the override is recorded with attribution, reason, and timestamp. Overrides are not suppressed; they are foregrounded as evidence of human judgment in the loop.

The override discipline is particularly important. A fiduciary process that always agrees with AI output is a process where the AI is making the decision, with the fiduciary acting as a rubber stamp. A fiduciary process that records both agreements and overrides — and the reasoning for each — is one where the human judgment is auditable.

Implementation note: the platform today implements an assessment-override audit substrate that captures four override types — dimension score, weight, gate waiver, and disposition — with per-entity attribution and timestamps. The conversation-history-as-audit-trail layer (questions asked, responses produced, drill-downs) is partially implemented; full implementation is on the development roadmap. The override substrate is the most architecturally complete of the five disciplines in the platform today; the conversation-history layer is the most architecturally incomplete.

7. A Diagnostic for Evaluating Platforms

LPs evaluating AI-augmented research platforms in the next 12–24 months will see many claims about AI capability. The five disciplines above provide a sharper filter. For any platform under consideration, the following questions are architectural — they test whether the platform was built around fiduciary defensibility from the start, or retrofitted with it.

Discipline	Architectural test
Multi-channel sourcing	Show me a per-fund decision memo where I can see, channel by channel, which channels contributed which findings. Show me a case where channels disagreed and how the disagreement was surfaced.
Source taxonomy	Show me the claim-to-source traceability for a specific claim in an actual AI-produced artifact. Can I see the specific passage? The source version at the time of generation? The distinction between direct citation and inference?
Confidence tiers	For a claim labeled "Moderate confidence," show me the dimension scores — source quality, agreement, completeness, synthesis distance — that produced that label. What would change to promote it to Strong?
Multi-layered grounding	What prevents fabrication at the generation step, not just at the verification step? What happens when the verification layer detects a fabricated non-numerical claim?
Audit trail	Reconstruct a deliberation that happened three months ago. Show me what view the fiduciary saw on a specific date during the deliberation, and what changed by the date of decision. Show me an override event and its reasoning.

Platforms that pass these tests were built around fiduciary defensibility from the start. Platforms that fail one or more were built for a different purpose — often general-purpose enterprise AI, retrofitted with sourcing claims after institutional buyers asked for them — and are being positioned into the fiduciary market. The difference will be evident in the answers, and in the engineering depth of the conversation that the answers invite.

A consequence of this diagnostic: AI-platform evaluations should include a deeper technical review than typical SaaS procurement produces. The questions above cannot be answered from a sales deck. They require:

Access to actual AI-produced artifacts, not sanitized demo output
Direct engineering conversation about the source-to-output pipeline
Review of the data-flow architecture, not just the top-level deployment diagram
Inspection of the data model for the audit-trail entities
A test run against the institution's own materials, observed under the institution's own compliance conditions

The cost of the deeper evaluation is proportionate to the exposure. A fiduciary who commits to AI infrastructure without this depth of review is accepting the risk that a future regulatory or litigation review will find the architectural hole the procurement review did not.

8. What This Means for the Next 12–24 Months

The framework specified here is achievable today, in our assessment, only at the architectural level. Even the most ambitious platforms — including PrivateMetrics — are mid-implementation against the full standard. Some disciplines are further along than others. Override audit trails, six-channel sourcing, and source-tagged facts are within reach today. Multi-dimensional confidence tiers, multi-layered grounding architectures, and full conversation-history audit trails are further out.

What this means for procurement: the right question is not "which platform implements all five disciplines today?" — because the honest answer for the entire category is "none of them, fully." The right question is "which platform's architecture is being built toward the full framework, with engineering substance behind each discipline?" — versus "which platform's marketing has been retrofitted to claim the framework while the underlying architecture was built for a different purpose?"

A platform that has shipped 30% of the framework with 70% in a tracked, prioritized engineering backlog — with each tracked item having a code-path target and an honest current-state description — is a different artifact than one that claims 100% in marketing copy with no backlog visible. The first is being built around fiduciary defensibility and is a credible long-term partner. The second has either built something different than what is being described, or has not built it at all.

The next 12–24 months are the period in which this gap will close at different rates across the category. Some platforms will close it. Others will not. The institutions that evaluate the gap honestly — using the architectural diagnostic in §7, treating the evaluation as a technical audit, asking for code-level evidence — will be the ones whose AI-augmented research processes survive external review when external review eventually comes.

That is what the AI-augmented research workflow has to become for institutional LPs. The framework above is the standard. The implementations that are honestly working toward it — backlog visible, gaps acknowledged, architecture in place — are the ones to take seriously. The marketing claims that already declare the standard met are the ones to interrogate more carefully.

This paper is part of a series on fiduciary discipline in institutional LP technology. See also: ILPA Reconciliation: Signed-Convention Formulas for LP Auditability — the data-discipline companion to this AI-discipline framework.