I analyzed 140,000 backtests, then built an AI algotrading agent. It’s CRUSHING the market.
How I 2x’d the S&P 500 with a single prompt
Press enter or click to view image in full size A screenshot of me deploying my AI-Generated Trading StrategyI typed one sentence into a chatbot.
It came back with 30 trading strategies, a detailed report, and a winning trading strategy.
I’m not bullshitting you. You can see the results yourself. Not only am I deploying the portfolio to the world, but you can read the step-by-step process of how the AI “thought” about the strategy.
The best part is you wield this power yourself. Here’s what I did step-by-step.
What most AI does wrong!
Most LLM-based trading agents act like the folks on Reddit.
They “claim” to be doing “research”. The inputs are price, technical indicators, fundamental indicators, and news. And based on what sounds good in theory, the agent makes the trade.
This is actually a common framework. For example, the StockBench benchmark is one of the most popular examples of using AI to trade… and it does exactly this. The inputs are cherry-picked stocks in the DOW, news, and financial statements. The outputs are guesses — buy, sell, or hold.
Press enter or click to view image in full size A diagram from the StockBench website. While it claims a “Back-Trading” Environment, the reality is so much more simplisticNo real backtesting. No persistent memory or learning from mistakes. Just trading based on vibes. That’s why the best AI on the list earned a nickle above 2%.
Press enter or click to view image in full size The StockBench BenchmarkThis is what my AI does differently.
You pick 1–2 Stocks at any given moment. My AI picks from 10,000!
The AI that I built isn’t trading like someone who stumbled on WallStreetBets for the first time.
It’s trading like someone with a finance degree.
This Rust-based agent is able to launch backtests that analyzes over 10,000+ stocks at once. It can then filter by momentum, RSI, volatility, cash flow, and just about anything else you can imagine.
Press enter or click to view image in full size An image depicting an example of how the rebalance action worksThis is not a fixed rebalancing strategy that works on a handful of stocks. This is a dynamic pipeline for choosing stocks. We can add filters, select promising candidates, and adjust the weighting to whatever you want. Instead of just buying and selling (which is still possible with the AI), the AI constructs portfolios and evaluates them based on what happened in the past.
It’s important to note that this isn’t a simple script made in a day. This backtesting engine took years of engineering ingenuity to act on real-time data. When a stock is delisted, your positions are liquidated (with a 10% haircut). When a stock enters the S&P500, the system recognizes it when it happens. I spent years making the system accurate, auditable, and lookahead-bias free.
Press enter or click to view image in full size You can audit every single event emitted by the backtesting engine. This is a “rebalance signal” eventIt’s not perfect. But it took real work.
After creating our strategies, we can launch a dozen backtests at the same time across different time periods. Our goal is to see what strategies worked and what strategies didn’t. At the end, the AI picks a “winner”, but we can deploy whatever portfolio we want.
Press enter or click to view image in full size Some of the portfolios created with a single prompt- The Blind Test (The Past 1 Year): Once the strategies were built, I locked them. I then ran them against the last 12 months of fresh data — a period the AI had never seen during its analysis.
Since starting to build out Aurora, my platform has processed over 100,000 backtests. I decided to put the dataset to good use and make my AI learn from its mistakes.
Turbocharging the AI with a data-driven analysis
I couldn’t let past backtests sit in cold storage and do nothing.
I had to learn from it.
To do this, I built a script that decomposes every single backtest into its basic components. The trading strategies, trigger conditions, actions, and indicators. And then, I performed statistical analysis to see which components actually win.
Think of it like this. Every backtest is a recipe. Instead of asking “was this recipe good?”, I’m asking “which ingredients show up in good recipes?”
Here’s how it works:
- Step 1: Extraction. A TypeScript script connects to my production database and pulls every completed backtest. For each one, I broke the portfolio apart into five levels: the full strategy set, individual strategies, entry/exit conditions, buy/sell actions, and technical indicators.
- Step 2: Normalization. “Buy $1,209 of SPY” and “Buy $1,189 of SPY” are basically the same thing. So the script aggressively rounds numbers to group near-identical strategies together. This prevents the analysis from treating tiny parameter variations as completely different strategies.
- Step 3: Statistical analysis. For every component, the script calculates annualized returns, Sharpe ratios, Calmar ratios, and more. Then it ranks them and builds a “corpus” — the top performers by each metric, plus an “elite” tier that’s in the top 20% across all metrics simultaneously.
- Step 4: Diversity filtering. This is something I picked up at Carnegie Mellon when taking a Data Science class. I knew that if I picked the top 100 components by Sharpe ratio, I’d get 100 minor variations of the same strategy. So I use techniques from old-school natural language processing, TF-IDF and cosine similarity, and used them to enforce diversity. The script selects components that are both high-performing and meaningfully different from each other.
The exact pipeline, including the extraction code, the data schema, normalization process, statistical analyses, and graph generation code can be found in this gist. If you’re the data science type, drop it into Claude and see what it says.
The end result of this pipeline is a curated dataset of what works and what doesn’t, distilled from 140,000+ backtests. I used cold hard facts, such as “momentum tends to work for individual stocks”, and inputted them into the system prompt.
With my new engine that can handle over 10,000 stocks and my AI literally learning the components of a good strategy, I decided to test to see what I can one-shot with Aurora.
The results did not disappoint.
The end result speaks for itself
I started with a single prompt. Something simple, that all traders understand.
Create the best momentum-based strategy for fundamentally strong stocks
Press enter or click to view image in full size The AI asks smart, pointed follow-up questions to guide the processI didn’t just end up with one portfolio. I ended up with 3.
Each of the portfolios matched or exceeded the broader market. The “winner” as selected by the AI did the best out of sample, and earned 36.94% vs SPY’s 16.0%.
This is with a 0.3% slippage each trade.
It also had a higher risk-adjusted return (sharpe 1.39 vs 0.61, sortino 2.02 vs 0.95) and a better maximum drawdown (15.4% vs 20.0%). In this sample, it’s quite literally better in every single way.
This analysis wasn’t done on cherrypicked, curve-fit data. The timeline is split into two distinct phases.
- The Training Phase (2020–2024): Where the AI iterated on strategies across 5 years of data to ‘learn’ which indicators survived the COVID crash, the 2022 Bear Market, and the 2023 recovery. It used this data to build the strategies.
- The Blind Test (The Past 1 Year): Once the strategies were built, I locked them. I then ran them against the last 12 months of fresh data — a period the AI had never seen during its analysis.
Press enter or click to view image in full size The 3 winning portfolios generated by the AIThis is critical. The AI didn’t optimize the portfolio from 2020 to 2024, then presented the testing period results. It acted like a real data scientist, splitting the data into a training and validation set.
The other strategies are similarly very strong, with the only one losing to SPY was the defensive quality mean reversion idea. And, considering SPY has been up by over 16%, that’s not too shabby.
The 3 portfolios aren’t black boxes impossible to reason about. They aren’t non-deterministic language models that change its output quality with the season. They are deterministic trading rules that you can code yourself in a weekend.
The trading rules are:
- The “All-Weather” Winner: The strategy took all S&P500 stocks, filtered by those with strong fundamentals, then selects 15 with the highest monthly change. It basically buys blue-chip strong stocks that were breaking out. This strategy DESTROYED the stock market in every period tested
- The “Bear Market Specialist”: This strategy filtered to S&P500 stocks with a positive net income and a stock rating of 2. Then it selected the top 15 stocks with the LOWEST 14-day RSI (signifying the stocks that are most oversold). It basically finds safe profitable investments that have been beaten down the most. This portfolio did decently in bull markets, but outshines in bear ones.
- The Best Capital Protection: When SPY is above its 200-day simple moving average, it holds top momentum stocks. When it’s below, it holds the top dividend payers and cash. This strategy achieved the lowest max drawdown during the bear markets
I’ve deployed each of the portfolios live so the entire world can see how they perform in the real world.
If this is the power of AI with one lazy prompt, imagine 1 million more backtest datapoints and a dedicated student eager to learn.
Don’t know how to trade yet but willing to learn? Medium members can learn to build for free at nexustrade.io/medium
Concluding Thoughts
Most people think AI trading means feeding stock prices into an LLM.
It’s so much more than that.
If Aurora can generate three market-beating portfolios from a single prompt, imagine what she’ll do with an all-night jamming session. If she learns from 140,000 backtests, imagine what she’ll find with a million. Each strategy is live; it’s not a theory, and you can read, learn from, copy, and modify the trading results for yourself.
You can LITERALLY learn from my strategy… or make your own.
This one example shows you why Wall Street has been using AI for decades. The question is… when will you?