← All Articles
Public Portfolio Challenge dashboard at 11:56 AM showing $25,550.20 with eight option legs listed individually in the holdings panel and no spread grouping

$25,000 Public Portfolio Challenge · Episode 7 · Day 2

My AI bought a stock it shouldn't have. Then I couldn't get rid of it.

A bad price tick fired a signal I didn't want. The fill landed at the open. The exit didn't exist. So I rebuilt the risk engine.

Austin Starks Austin Starks ✦ Founder, NexusTrade ✦ May 6, 2026 ✦ 11 min read

The order from yesterday filled.

Yesterday at the close I queued a META 610/655 bull call spread, one contract, $1,470 net debit. The order didn't fill at the limit so it sat overnight as a working order at the broker. This morning at the open, Public takes it.

I see the confirmation land in my chat. Two filled spreads now: AVGO from yesterday, META from this morning. The dashboard ticks up. Account value sits a little above $25,500. Day 2 of the challenge is officially underway.

Public Portfolio Challenge dashboard at 9:28 AM showing $25,570.12 with the META 610/655 spread filled alongside AVGO 430/490 and GOOG 385/415, six option legs listed individually
9:28 AM · META filled. Six option legs on the books, none of them grouped. · open the live portfolio →

I open the META chart out of habit, more to admire the fill than to second-guess it. And then I stop.


This trade should not exist.

The strategy fires on a 14-day RSI cross above 50 in a name that has been in oversold territory recently. The filter is mechanical: the indicator either clears 50 against the live data or it does not. There is no human judgment anywhere in the rule.

So I open the in-app agent and ask Aurora a single question: look at my public portfolio, META's RSI, and confirm that the trade I made today indeed should have happened.

Aurora plans the audit, runs Fetch Portfolios, runs the Portfolio Event Auditor, and runs the stock screener for META and SPY. The screener verdict is the one I expected: META 14-day RSI = 28.45, well below the 50 threshold the strategy gates on. The agent's own conclusion: "the trade should NOT have happened automatically. Because Meta's RSI is currently 28.45, it does not meet your strategy's requirement of being above 50."

Two pieces of evidence in the same run point in opposite directions. The strategy says META should have been benched. The strategy also placed the trade. Aurora's third tool, the event auditor, returns this:

The agent's false negative

"No events found for portfolio 'Public Portfolio Challenge' with the requested filters. Try a wider date window, different event types, or include Market/NoSignal events if you are auditing raw cadence."

Aurora concludes from this that the trade was likely a manual execution or an external action not driven by this specific RSI strategy. That conclusion is wrong. The events existed. The failure is architectural, not the model's: the auditor was originally written to fit a forensic audit inside an LLM context window, so it fetches up to 500 events, samples them into CSV, and asks the model to summarize. The compression saves tokens. It also strips signal-condition audit values, formats price objects as [object Object], and loses the exact fields the investigation needs. The tool tells the user nothing happened. Aurora repeats it.

Aurora agent verdict at 9:27 AM: confirms META RSI(14) = 28.45 (well below the strategy's 50 threshold), correctly concludes the trade should not have happened automatically, then incorrectly concludes the trade was a manual execution because its event auditor returned no events.
9:27 AM · Aurora's verdict. The RSI math is right. The cause is wrong. · open the full agent run →

I know I did not manually buy META. The order came out of the strategy. So either Aurora's auditor is missing the event, or the event was written wrong. Either way, the existing auditor cannot find what I need it to find. I open Cursor with the NexusTrade MCP server directly connected and start walking the events by hand.


One bad tick. Two layers of damage.

I ask the MCP server for the OpenOptionSignal event tied to yesterday's META trade. The Market ticks adjacent to it. The condition audit that the trader recorded when the signal fired. The events were always there; the auditor's summarizer was throwing away the evidence.

The condition audit comes back. It says the trader saw RSI(14) = 74.045 when it fired. That is a hot reading, and on its face it satisfies "RSI above 50." But 74 is also nowhere near where META has actually been trading. The same audit lists the input price the engine used to compute that RSI. The price was $640.23.

I pull the surrounding Market events. META's day high yesterday, on the same snapshot, was $614.35. The print the engine consumed was twenty-six dollars above the day's high. There is no candle on the actual chart at $640.23. There is no print on Public, on the broker tape, or on the public quote. There is one bad tick in my own snapshot stream, and the engine took it as gospel.

The bug

The live tick stream has occasional anomalies. The price-update path was accepting every tick at face value, computing fundamentals on top of it, and re-evaluating every condition any strategy had against the new value. A single $640.23 print on a snapshot whose own day-high was $614.35 was enough to push RSI(14) from roughly 28 to 74 in one update. That re-evaluation tripped the META entry filter. The signal got written. The order got queued. The order filled at the open.

I am holding a position that exists because of a bug. The position itself is harmless: META 610/655 BCS, one contract, defined risk, even green on the day. None of that is the point. The point is that the strategy did not actually want to be in META right now. It was tricked into thinking it did.

The first fix is the one this requires: the price-update path needs to refuse anomalous ticks. A live tick whose value is outside the broker's own day-high or day-low for the same snapshot is, with high probability, garbage. The fix lives in fundamental.rs as a sanity_check_tick guard with a 50-basis-point tolerance. Twelve unit tests, including a regression named after yesterday's exact numbers. If a future tick comes in 4% above day-high, it gets dropped at the floor and the snapshot doesn't move. The signal that fired yesterday cannot fire that way again.

The back-end half of the problem is now contained. Future bad ticks get dropped at the floor before any indicator sees them. The trade itself still exists in my account. So I go to close it.


I cannot close this position.

I open the holdings panel. There are eight rows. Each row is a single option leg.

Holdings panel showing eight option legs listed individually: META 655c, GOOG 415c, META 610c, NVDA 205c, NVDA 230c, GOOG 385c, AVGO 430c, AVGO 490c. No spread grouping.
11:56 AM · eight legs. Zero spreads. The UI didn't know what it was holding.

The META 610 long call is one row. The META 655 short call is another row. The UI does not know that those two legs are one position. As far as the holdings panel is concerned, I am holding two unrelated META options that happen to share an underlying. The same is true for AVGO and GOOG. The same is true for the NVDA spread I had opened manually earlier in the day.

There is no "Close spread" button. There are eight "Close leg" buttons.

This is the worst possible state for a defined-risk options portfolio to be in. A bull call spread is defined risk because the long call caps the short call's loss. The short call alone has unlimited loss potential. If I click "Close leg" on the long META 610 to get out of the position, I am left with a naked short call on META at 655. That is a position my account is not even approved to hold. It is also a position that, on a META gap up, could blow through the entire account in a single overnight session.

The platform I have spent five years building does not have a button to close a debit spread.


Don't close it manually. Build the system.

The first instinct is to call the broker, reach the options desk, and have a human flatten the spread by phone. That works for one position one time. It does not work for the next sixty days of the challenge, and it does not work for any of the other users on the platform who are going to hit the same wall the next time they trade a vertical.

The second instinct is to open Public's app and try to close the spread directly there. Public's app does know what a spread is. But that path leaves my own platform's risk engine permanently behind on the truth: the position closes on Public, my UI still shows two open legs, and the auto-recognition I have not written yet still doesn't know what was happening. Today I get out and tomorrow the same trap is sitting in the codebase for the next person.

The fix is not this trade. The fix is the system. So that's what I do.

I let the position sit. It is defined risk by structure. It is not bleeding. Even if it were, the most I can lose on it overnight is the $1,470 debit I paid, and that bound is enforced by the brokerage on its end whether or not my UI surfaces a button. The trade is safe to leave alone for a few hours. The system is not safe to leave alone for a few hours. So I rebuild.


Eleven hours. Two dozen commits. One risk engine.

The work splits into five layers, each one closing a different gap that the morning surfaced. They land in this order, on purpose. Each layer assumes the one before it.

Layer 1: Stop the bug from firing again.

The bad-tick rejection guard goes in first. It runs at every price-update construction site in the codebase, paper and live. Tolerance is 50 basis points around the broker's same-snapshot day-high and day-low. Anything outside that band gets logged and dropped before the indicator engine ever sees it. The unit test suite includes a named regression guard for yesterday's exact numbers.

Tick Day high (same snapshot) Distance Decision
$614.30 $614.35 −0.01% Accept
$617.20 $614.35 +0.46% Accept (within 0.5%)
$640.23 $614.35 +4.21% Drop. Snapshot unchanged.

The signal that fired yesterday cannot fire that way again. Layer 1, done.

Layer 2: Expose the Rust risk engine for live execution.

The Rust engine already had a risk system. Years of backtest-accuracy work lived in there: the rules that say what is and is not a legal close, what counts as covered, what counts as naked, what would be rejected by a real broker. That risk system had been doing its job for backtests, where the answer to "is this close legal" is the difference between a backtest you can trust and a backtest that quietly skips trades the real broker would have killed.

What the engine actually does, in one sentence: take a portfolio's current option positions plus a proposed close, simulate the post-close state, and check whether any of a small number of rules would be violated. The rules:

  • Naked-short suppression. Closing the long leg of a vertical (or one wing of an iron condor, or a calendar/diagonal pair) that would leave a same-underlying short option without its risk-capping companion. Verticals are defined-risk because the long leg caps the short leg's exposure. Remove the long leg and the trade is no longer defined-risk.
  • Covered-call uncoverage. Closing shares that would leave a same-underlying short call without at least 100 shares of coverage per contract. The cover-at-entry check already existed; this is the close-side mirror.
  • Quantity sanity. Negative quantities, zero, NaN, and fractional contracts (with a 1e-6 tolerance for float noise) are rejected outright. Real brokers do not accept fractional option contracts and never will.
  • Strategy integrity. The engine knows about vertical debit/credit, iron condor, calendar, diagonal, straddle, strangle, covered call, and the rest. Each strategy has its own coverage rules; the engine evaluates the post-close state against the right one based on what spread the leg belongs to.

Output: a per-leg verdict (ok / warn / block), a reason code, and the specific symbol that would end up uncovered if the close went through. UI dry-runs ask on every keystroke. Live submissions ask once at the broker handoff. Both get the same answer.

Paper trading inherited the risk engine for free. Paper is fully automated, so every simulated fill goes through the engine on its way to the position, and a flagged trade simply never happens. Live trading has an approval flow: the strategy queues an order, a human clicks Approve All, and the close path on filled positions is also UI-driven. Those two UI surfaces were never wired to the engine. I wanted them to reflect what the engine actually says.

I lifted the risk engine out of the backtester, made it stateless, and wrapped it behind an HTTP endpoint. The backtester still calls it directly. Live execution on the Node side hits it before every multi-leg order submission. The new close-leg modal pings it on every keystroke as the user adjusts contract counts. MCP tools, agents, and every future caller route through the same endpoint and receive the same verdict.

One service. One set of rules. One source of truth. Backtest verdicts and live verdicts are the same code. There is no longer a way for the UI to render a green confirm button while the engine would have said naked.

HOW THE ENGINE EVALUATES A CLOSE REQUEST CURRENT POSITIONS META 610C long · +1 contract META 655C short · −1 contract PROPOSED CLOSE META 610C close long leg only RISK ENGINE EVALUATION 1 Build spread index META 610/655 · vertical · debit 2 Simulate post-close remove +1 META 610C 3 Resulting position −1 META 655C alone 4 Naked short rule match: short call without long cap → TRIGGERED VERDICT BLOCK reason naked_short_leg leg: META 655C ⚠ META 655C orphan unbounded loss potential if META gaps up overnight

CLOSE-LEG VALIDATION · DEBIT SPREAD · ANIMATED

The validator picked up one new rule on the way through: closing the underlying that would leave a same-underlying short call without 100x share coverage. Coverage at entry already existed; close-side coverage now does too.

Twenty-six unit tests cover the cases that matter: vertical debit and credit, iron condor (block one wing, allow the full close), calendar, diagonal, straddle, strangle, covered call across four sub-cases, mixed multi-spread batches, invalid quantities, and wire-payload extra-field tolerance. Fail-closed by design: if the risk engine is unreachable, the close request returns a 503 and an alert email fires. The gate is non-optional.

Layer 3: Teach the server what a spread is.

The validator was now safe but the server side had no way to ask the question. The portfolio endpoints (getPortfolio, getSharedPortfolio, getPartialPortfolio) returned a flat list of positions with no grouping. The UI had nothing to render except a flat list.

I added a server-side spread grouper that walks every position, reads the optionMetadata.spreadId tag on each leg, and collapses tagged legs into GroupedHolding rows. The grouper is strategy-aware: bull call debit, bear put debit, iron condor, straddle, calendar, diagonal, covered call. The covered-call detector pairs long stock against same-underlying short calls when shares cover at least 100 times the absolute contract count. Twenty unit tests cover every variant. Anything the grouper doesn't recognize falls through to a single-leg row, so users never lose visibility on a position because the server didn't know the strategy name.

The portfolio endpoints now return a parallel holdings field alongside the existing flat positions list. Existing callers who do not know about spreads keep working unchanged. Callers that do know now get the spread structure for free in the same response.

GET /api/portfolio/:id

# existing flat list, untouched
"positions": [
  { "symbol": "META 610C", "quantity":  1, "lastPrice": 24.52, ... },
  { "symbol": "META 655C", "quantity": -1, "lastPrice":  7.95, ... },
  ...
]

# new parallel structure, computed server-side
"holdings": [
  {
    "kind":             "spread",
    "underlyingSymbol": "META",
    "strategy":         "Bull Call Debit",
    "width":            45,
    "breakeven":        624.70,
    "maxProfit":        3030,
    "maxLoss":          1470,
    "legs":             [ /* references into positions[] */ ]
  },
  ...
]

Layer 4: Build the UI you couldn't get an hour ago.

With grouped holdings on the wire, the holdings panel has something to render as a spread. The first iteration was a flat list of spread rows with expandable legs. By the afternoon I had iterated to a three-tier hierarchy that mirrors how a trader actually thinks about a portfolio: asset row → spread row → leg row. The asset card collapses every position on a given underlying (long stock, covered calls, every spread on AVGO, etc.) into one section. Inside the asset card, each spread row carries the strategy name (Bull Call Debit, Iron Condor, Covered Call), the strike summary parsed inline from the OCC symbols, the width, the days to expiration, the breakeven, and the live mark. Inside the spread row, the leg rows show per-leg detail. Options collapse by default, so the panel does not vomit eight rows at you the moment you open the portfolio.

Three-tier holdings hierarchy

▾ META · 1 spread · mark $1,657 · +$308
▾ Bull Call Debit · 1 ct · $610/$655 · 30d · BE $624.70 CLOSE SPREAD
META 610C · long · 1 ct
META 655C · short · 1 ct

"Close Spread" is the primary action on every spread row. It opens a close-leg modal that does a debounced 200-millisecond dry-run against POST /api/order/option/multileg/dry-run on every contract-count change. The dry-run hits the Rust validator from Layer 2. The modal renders block, warn, and ok states with per-leg detail, the strike and expiration on each side, and the limit price suggestion auto-filled from the current mid. Block disables the confirm. Warn requires an "I understand" acknowledgment. Confirm submits the same legs to the live multi-leg endpoint, which itself runs the same validator before sending to the broker. Pending-close gating immediately greys the spread row out so a fast double-click cannot double-submit. The user cannot get a green confirm screen unless the validator agrees.

Layer 5: Make sure the legs that are already on the books get tagged.

Layers 2 through 4 assume every leg of every spread already carries the right spread tag in the database. The morning's holdings panel was proof that almost none of them did. Three different ways the tag was getting lost:

  • The live brokerage payload was wiping it on every refresh. Brokers do not model spreads; they return leg-by-leg data. The live-account refresh was overwriting the database with the broker payload, dropping the platform's spread tag every time. Fix: a small merge helper that re-stamps stored tags onto broker positions by symbol on every refresh.
  • Single-leg fills were not being recognized as halves of a spread. If you opened the long leg and short leg as separate orders, neither carried a spread tag. New recognizer in the order queue: when a single-leg fill matches an existing open option with the same underlying, expiration, and right but the opposite side and a different strike, the pair gets persisted as a vertical with both legs sharing one spread tag.
  • Multi-leg fills were not being tagged forward to the position. The order document carried the spread tag at fill, but the resulting position documents never got their copy. Fix: stamp the tag onto the position the moment its parent order transitions to filled.
  • The Rust trader self-heals on its own side. On every startup the Rust live trader rebuilds the spread tags from filled order history. A restart no longer begins with a stale view. The two sides converge on the same answer regardless of which one writes first.

And one backfill script for everything already on the books. It walks the portfolios, finds option positions missing the tag, looks up the matching filled multi-leg order, and stamps the tag in. Dry-run by default.

The almost-mistake

An earlier draft of the grouper used a heuristic to pair orphan legs visually in the UI without writing the spread tag back to the database. The holdings panel would have looked correct. The risk validator, which keys off the spread tag in the database, would still have seen naked legs. The UI would have lied to the validator. I shipped that for about fifteen minutes, looked at the implications a second time, and reverted. The proper path is to write the metadata to the database first and read it back. Revert the heuristic. Then the backfill. Then the forward-looking fix.


Fix the agent that couldn't find the bug.

The agent run that started this morning is public. Aurora got the symptom right (META RSI 28.45, way under 50, no way the trade should have fired) and the cause wrong (concluded the trade was probably manual). The cause was wrong because the event auditor it used was structurally incapable of surfacing the bad tick: a one-shot prompt that fetches up to 500 events, samples them into CSV, formats price objects as [object Object], drops the condition-audit values, and asks an LLM to summarize. The events were always in the database. The auditor was throwing them away on the way to the prompt.

So I codified the path I had to walk by hand. The Cursor plan is on disk as event_audit_upgrade_c3151102. First piece landed today: granular event-query tools added to the agent, sharing the same MCP query implementation so the agent and the MCP API give the same answers. The market-price formatter normalizes both { price: number } objects and bare numbers, includes same-snapshot fundamentals (closingPrice, highestPrice, lowestPrice) inline, and flags any price outside the snapshot's own high-low range. Empty queries no longer return "no events found"; they return a diagnostic response with applied filters and suggested next calls. Signal events surface the condition-audit values inline.

The next time someone asks Aurora the same question I asked it this morning, the answer will be the actual answer, not "looks like it was manual." That is what dogfooding is supposed to mean: the platform diagnoses itself, finds the diagnostic missing, and teaches itself the diagnostic.


Queued the META close. Through the system I just built.

By 8 PM the holdings panel is rendering the way it was supposed to from the start. Four asset cards: AVGO, GOOG, NVDA, META. Each card collapses the spread inside it. "Close Spread" is the primary action on every row. Strike summary, width, days-to-expiration, breakeven, and live mark all render inline. Eight orphan option legs at 11:56 AM, four self-aware spreads by 8 PM.

I open the META row and click "Close Spread." The modal pops with both legs filled in, strike and expiration on each side, the per-leg limit auto-suggested from the current mark. The dry-run hits /risk/validate-close. It comes back green: both legs valid as a multi-leg close, no naked-short trap, no covered-call uncoverage, no fractional-contract issue. I set the time-in-force for an open execution tomorrow and submit.

Public takes the order as a queued multi-leg close, both legs linked. The submit toast confirms. The pending-close gating greys the spread row so a fast double-click cannot double-submit. The META row reads close queued. The action button is disabled. Nothing else on the panel is affected.

I am closing this trade because it should not have fired. I am closing it green (up $308 on a $1,349 fill, +22.8%) because META happened to run the right direction by accident. None of that changes the framing. The strategy did not actually want to be in META right now. So I queued the exit, through the same atomic multi-leg path the AI uses to open positions, gated by the same Rust risk engine that protects every other close on the platform.

That is the philosophy I am running this challenge on. As close to fully AI-driven as possible. The system fires. The system corrects itself when it is wrong. The system flattens the trade it should not have opened, on the same rails it would have used for any normal exit. The only thing my opposable thumbs did was click submit. Even the trade I do not want is exiting through the system that built it.


Trading update.

Day 2 closes with four spreads on the books, one of them queued for an open exit tomorrow, and the account at $26,304. Day 2 P&L: +$1,315, +5.3%. Inception-to-date: +$1,304, +5.2%.

Spread Ct Net Debit Mark P&L % Move
AVGO 430 / 490 1 $1,975 $1,765 −$210 −10.6%
GOOG 385 / 415 2 $1,864 $2,776 +$912 +48.9%
NVDA 205 / 230 3 $2,385 $2,679 +$294 +12.3%
META 610 / 655 · close queued 1 $1,349 $1,657 +$308 +22.8%

GOOG is the standout, less than two points from the +50% close-to-target rule. NVDA was opened mid-day after its 14-day RSI cleared the 50 filter. META is being closed on purpose: the trade should not have fired. AVGO is the only red name and is well inside its 25% per-name drawdown stop.

What I'm watching tomorrow

  • META exit fills at the open. If META gives back any of its move overnight, the queued close gets a cleaner exit price. If it gaps up, the limit may not fill and I requote.
  • GOOG and the +50% rule. Sitting at +48.9%. The strategy's mechanical "Close at +50%" rule fires automatically at the next snapshot that prints over the line.
  • Layer 5 in the wild. The metadata hydration / Rust self-heal path runs on every live reconcile. First overnight cycle is the first real test of whether the four remaining spreads stay tagged through every refresh.

Day 1 was the strategy. Day 2 was the engine.

Yesterday I wrote that the strategy was the easy part. The infrastructure was the hard part. I thought that was the lesson of Day 1. It turns out it was the thesis statement for the whole challenge.

Day 2 closes with the AI-fired META trade exiting through the same AI-built rails that opened it. That arc is the one I am running this challenge to test. AI fires the trade. The system catches its own bug while the position is on the books. The system queues the exit. The human approves a button. End-to-end, the platform self-heals from its own mistakes, on the same machinery that produced them.

Day 2 surfaced a class of bug that paper trading literally cannot find: a position the system is not designed to safely close. Paper trading exits are abstract. They do not interact with broker-side leg classification. They do not collide with a UI that shows leg-by-leg actions on a position that the user thinks of as one trade. They certainly do not test what happens when the engine fires a signal it shouldn't have fired and the user wants out.

The trigger was a single corrupt tick in the live snapshot stream. The damage was not the trade itself. The damage was discovering that my own platform did not have the rails to safely undo it.

The shape of the rebuild matters more than any single piece of it. The Rust risk engine that the backtester has been calling for years is now the same code that gates the live trader's close path, the UI's dry-run modal, the multi-leg endpoint on the server, and anything an MCP tool or agent will ever ask. The UI cannot render a green confirm while the engine would have flagged the close as naked, because there is no separate UI logic to disagree with. That is the architectural payoff of the day.

Everything else exists to make sure the risk engine is asking the right question of the right data: the spread grouper, the spread-aware UI, the live-broker metadata hydration, the vertical auto-recognition, the backfill. Layers 3, 4, and 5 are all in service of Layer 2.

Public Portfolio Challenge live embed · refreshes from the public NexusTrade shared portfolio feed

The position was the easy part. The system that lets you exit was the hard part.

Day 1 surfaced three bugs that had been latent in my codebase for years: a 100× multiplier in two languages, a sign convention disagreement across three layers, and a Public multi-leg quantity bug that only fired at two contracts or more. Day 2 surfaced six more: bad-tick acceptance, broker payload wiping optionMetadata, multi-short-call silent drop in the spread grouper, missing close-side covered-call rule, fractional-contract pass-through, and per-share-vs-per-contract drift across all four brokerage adapters. Paper trading found none of them. The live tape found all of them within twelve hours of the first fill.

Tomorrow is Day 3. The remaining three spreads still have to play out against their own targets and stops. Whatever happens to them, the next time my engine fires a trade I do not want, I have a button for it.


Day 2 of the challenge is on the tape.

Every fill, every close, every bug that gets surfaced next, in real time. The portfolio is live, the chart is honest, and the rails that make the next exit safe are in production.

Open the Live Portfolio →

Discussion

Sign in or create a free account to join the discussion.

No comments yet.