What a Year of Building AI in Structured Finance Actually Taught Us
The lessons nobody puts in the demo.

In 2025, our team built production AI systems that process billions of performance records for tens of millions of mortgages, develop cash flow models for complex private ABF structures directly from documents, and connect large language models directly to bond analytics APIs.
We built dashboards, connectors, and credit analytics. Some of them worked. Some of them taught us more by failing.
This is what we learned—not the polished conference talk version, but the notes we’d share with a peer team starting the same journey.

The Value Shift Nobody Prepares You For
A portfolio delinquency analysis that used to take three hours now takes twenty minutes.
That sounds like a win. It is a win. But it also raises a question that’s harder to answer than any technical problem we solved this year:
If AI handles in minutes what took us hours, what are we contributing?
When we started pulling this thread, we realized that a significant portion of what felt like skilled analytical work was actually mechanical labor—data extraction, formatting, applying the same methodology we’d applied dozens of times before. The expertise was real, but it was wrapped in hours of execution that masked how much of the work was routine.
Here’s where we landed:
AI handles the “how.” Humans own the “why” and “so what.”
The value now lives in knowing which questions matter. Understanding what the client really needs versus what they say they need. Recognizing when output is wrong because we understand the domain deeply enough to see the error.
That’s an entirely different skill set. It requires judgment, contextual awareness, and domain intuition that deepens over years—the kind of expertise AI can’t simply replicate, unlike procedural analytical work.
Not everyone will make this transition comfortably. The analysts who built their identity around being fast and thorough at execution face a harder adjustment than those who always saw execution as a means to an end.
We don’t have this all figured out yet. But we’ve stopped pretending the shift isn’t happening.

Stop Asking AI to Write Code—Start Asking It to Think With You
For years, we used Claude as a coding assistant. “Write a function that does X.” “Convert this data from format A to format B.” “Generate a script that calculates Y.”
That works. But it captures maybe 20% of the value.
The shift that changed our results was: treating Claude not as a tool to instruct, but as an analyst to think alongside.
The difference looks like this:
Before (instruction mode):
“Write a Python script to calculate delinquency rates from this loan data.”
After (thinking partner mode):
“We need to identify hidden credit risk in this CLO portfolio—issuers that resemble recent defaults but haven’t shown price distress yet. What factors should we consider? What data would we need? Let’s build a scoring model together.”
That second conversation led to identifying hidden exposure across issuers. Claude suggested factors we hadn’t considered—CLO concentration patterns, industry clustering effects, the relationship between coupon levels and distress signals. We debated the weighting. We refined the methodology. The output was genuinely collaborative.
The code that emerged from the second approach was better, but that’s almost beside the point. The thinking was better. The model was better. The insight was better.
This requires a different posture than most of us learned. You have to think out loud. Admit what you don’t know. Explain your reasoning and invite critique. Treat the AI as a colleague who happens to have read every document and doesn’t get tired—not as a sophisticated autocomplete.
The developers and analysts on our team who made this shift produce substantively different work than those who are still in instruction mode. And the gap is widening.

The First Version Will Be Wrong—Plan for It
We built a benchmark analysis comparing a client’s NonQM loan portfolio against the broader market. The analysis looked solid: the portfolio showed a 1.37% delinquency rate advantage versus the universe. Strong results. Ready to present.
Then someone asked about DSCR loans.
In NonQM lending, DSCR (debt service coverage ratio) loans are a category unto themselves—with measurably better performance than other NonQM products. When we segmented the data, we discovered the universe was comprised of 43% DSCR loans while the client’s portfolio had only 30% DSCR loans.
This changed everything.
The client’s portfolio had less exposure to DSCR loans (the better-performing segment) yet still outperformed the benchmark. That alone was impressive, but our initial analysis understated the true picture. Once we compared performance within segments (DSCR vs. DSCR, non-DSCR vs. non-DSCR), the client’s edge was even larger than we’d initially observed.
If we had presented the first version, we would have undersold our client’s own performance. The insight that mattered most—superior underwriting across both loan categories—would have been invisible.
Lesson: “Wrong” doesn’t mean broken. It means the output doesn’t fully reflect reality. Have a domain expert review the work before drawing conclusions.

Deploying AI Agents for End Users Is a Security Project
Building an AI agent that works in a demo is straightforward. Deploying that agent in a production UI where real users interact with real data took us months.
We built an agent that lets users query our bond analytics platform conversationally. The AI worked. Making it production-ready required solving problems:
Prompt injection: When users can type anything into a text box processed by an LLM, you inherit a new attack surface. We implemented input validation, output filtering, tightly scoped permissions, and logging that captures every agent action for audit.
Rate limiting: A single conversational turn might trigger 50 API calls. We built tiered limits—per-user, per-session, per-token—plus circuit breakers for runaway queries.
Session management: Agent sessions need conversational context across multiple turns, isolated per user, with graceful expiration handling and automatic cleanup.
Audit trails: Regulated industries need to know what the AI did. Every query, tool invocation, and response needs to be logged immutably.
The agent itself was 20% of the effort. Authentication, authorization, input validation, rate limiting, session management, and security review were the other 80%.
Lesson: In production, the agent is the easy part. The security wrapper is the product.
Post script: AgentCore from AWS and Agent Framework from Microsoft are solving the deployment and security headaches.

AI Is Good at Finding Information But Sometimes Overstates What It Means
While building the credit risk analysis, we asked Claude to research distressed issuers—companies that had defaulted or were showing signs of stress. We wanted to understand patterns we could use to identify similar risks in the portfolio.
Claude surfaced real-time signals we wouldn’t have found efficiently on our own: FTC antitrust actions, rating agency downgrades, refinancing walls, fraud allegations. Information that wouldn’t appear in pricing data for months was available in news coverage and regulatory filings. The research phase that would have taken days was completed in hours.
But we also caught Claude drawing confident conclusions from weak sources. In one case, it attributed claims to “industry reports” that didn’t exist when we followed the links. The search results were real. The sources were ‘real’. But the synthesis drew conclusions the sources didn’t support.
The lesson: use AI-powered search aggressively. It’s the difference between stale knowledge and current intelligence, especially in fast-moving situations. But verify specific claims. Click the links. Read the actual sources.
AI is excellent at finding relevant information across large volumes of text. It is sometimes too confident about what that information means when synthesized. The combination of broad retrieval and skeptical verification is more powerful than either alone.

Your Org Chart Isn’t Ready for This
Our AI strategy deck included projections: reduction in onboarding costs, increased client capacity and margin expansion.
The numbers were defensible. The business case was clear.
What the projections didn’t address: the organizational implications of realizing the promised efficiencies.
If analysts can serve five times more clients, do you need fewer analysts—or do you pursue five times more clients? If the answer is “more clients,” do you have the sales capacity? The support infrastructure? The management bandwidth?
If developers now own adoption metrics for the features they build, then what happens to the product managers who previously owned that? Are product managers freed up for more strategic work, or are they defending territory?
If AI drafts client communications, who reviews them? What error rate are we willing to accept? Who’s accountable when the AI gets something wrong?
These aren’t hypothetical questions. We’re navigating them now, and the answers aren’t obvious.
AI doesn’t just improve workflows. It reshapes roles. And most organizations—including ours—are making it up as they go.
The companies that figure out the organizational design will outperform those that simply purchase better software. The differentiation in 2026 won’t come from adopting AI. It will come from redesigning teams, incentives, and accountability structures around what AI makes newly possible.

What We’re Taking Into Next Year
A year of building AI systems in structured finance clarified a few things:
AI is more powerful than the hype suggests—once you integrate it into real workflows rather than treating it as a research toy.
AI is more frustrating than the demos show—the gap between “works in claude.ai” and “works in production” is where most of the time goes.
AI is more dependent on domain expertise than the automation narrative implies—it generates analyses quickly, but distinguishing plausible from accurate requires human judgement that compounds over years. The “why” and “so what” remain stubbornly human problems.
AI changes more than technology—it changes job descriptions, team structures, and how people understand their own value. The skill isn’t operating the tool; it’s knowing when the output reflects reality.
We don’t have all the answers. We’re still learning what this means for how we build software, how we serve clients, and how we organize ourselves.
But we’re no longer wondering whether AI will change our industry. We’re focused on making sure we’re the ones defining how.









Figure 1 History of Beta to S&P Bitcoin Index with Confidence Intervals
Figure 2 Correlations for 11 currencies (calculated using observations from 2021)
Figure 3 Daily VaR as % of Market Value calculated using various historical observation periods
Figure 4 VaR for a portfolio of crypto assets computed for various lookback periods and confidence intervals
Figure 5 BTC/Futures basis difference between generic and active contracts
Figure 6 Distribution of percentiles generated from posterior simulations
Figure 7 Weekly observed volatility for Bitcoin




