Building a lightning-fast highly-configurable Rust-based backtesting system

How I built a no-code backtesting system with millisecond execution speed

13 min readJan 19, 2026

The first backtesting system I ever built was in JavaScript. And it was slow as shit.

GitHub - austin-starks/NextTrade: A system that performs algorithmic trading

A system that performs algorithmic trading. Contribute to austin-starks/NextTrade development by creating an account on…

github.com

Even with testing one asset on open-high-low-close (OHLC) data, a 15-year backtest took 30 seconds or more.

Forget about even trying minutely data.

So I rebuilt it in Rust. The same backtest takes 0.03 seconds.

A screenshot of the logs that *proves the system’s raw performance*

And a 10-year multi-asset, multi-strategy backtest operating on minute by minute data takes 30.41 seconds, from removing the backtest from the queue to presenting the final results.

For context, QuantConnect’s LEAN engine, the industry standard for backtesting, has historically benchmarked at 33–78 seconds for a 10-year backtest with 1 million datapoints (source). The Rust engine I built hits much faster speeds for roughly 7x the data. This is calculated by taking:

10 years of historical data
7 assets (the Magnificent 7)
Minute-by-minute data
~252 trading days/year
~390 minutes/trading day (6.5 market hours)
Equals 6,879,600 datapoints

And the best part is that you can create your trading strategies without reading or writing any code.

A 10-year backtest testing the effectiveness of buying “the Magnificent 7”

Here’s what works and what doesn’t when it comes to building a high-performance no-code algorithmic trading system.

What EXACTLY was I trying to build?

A junior software engineer would understand that I was trying to build a backtesting system. I know this because when I started this project, I was a junior,

A senior engineer would’ve understood the software’s requirements.

I initially set out to just deploy no-code strategies. I evolved into needing a high-performance backtesting engine that could bring all of my crazy trading ideas to life.

It had to be lightning-fast. If I wanted to try crazy algorithms like genetic optimization, I needed a backtest to run from start to finish as fast as humanly possible.

Finally, the backtesting code could not deviate from the live-trading code. If I deployed a strategy, it should behave exactly like it did in a backtest.

Putting this all together, during my re-write, I came up with the following “quality attributes” for my system.

Speed and Concurrency
Configurability
Portability

But these weren’t the only requirements. Some of the secondary quality attributes that were also extremely important included:

Auditability: How do I know why a trading decision was made?
Extensibility: How do I add new features to test out novel ideas that may require alternative data sources?

Here’s how I tackled this insurmountable task.

Tackling the Problems of Speed and Concurrency

The first step in building a lightning-fast backtesting system was choosing the right tools for the job.

My JavaScript system was slow for two key reasons:

Inefficient algorithms: no junior-Austin, we shouldn’t calculate a simple moving average by summing the past x days price and dividing by the length. Use a sliding window.
Bad abstractions: Even as a junior, I cleverly designed a “Condition” abstraction for strategies. Too bad each condition required manually coding a new concrete TypeScript class.
Slow programming language: The single-threaded NodeJS just isn’t built to run CPU-bounded operations like 100 simultaneous backtests

I knew I had to do a full re-write.

// The original AbstractCondition class I implemented in TypeScript
// was pretty clever for a junior engineer. It had an abstract isTrue
// method for determining if a trading strategy should trigger or not
 abstract class AbstractCondition implements ICondition {
  public name: string;
  public conditions?: AbstractCondition[];

  abstract type: ConditionEnum;
  abstract isTrue(args: IsConditionTrue): Promise<boolean>;
}

I thought about many languages, including C++ and Golang. But with my obsessive personality, I knew I wouldn’t be able to sleep at night if I even thought that Go’s garbage collector was slowing me down. And I didn’t want to be in segmentation hell either.

So I chose Rust.

One nice thing about Rust is that it already had a technical indicator library that I could use. However, the way it was built didn’t align with my vision.

ta - Rust

ta is a Rust library for technical analysis. It provides number of technical indicators that can be used to build…

docs.rs

With the original library, it assumed that datapoints would be equally spaced. But as all real traders know, when deploying a trading strategy, shit happens. You’re not guaranteed to receive data at equal points at all time.

Maybe the market is closed. Maybe your data ingestor had a hiccup. Maybe the stock is halted!

Whatever the problem is, we absolutely cannot expect equal-length market data to come in. It’s a ridiculous assumption.

So I forked it, and built a brand new TA library for my specific needs.

GitHub - austin-starks/ta-rs-improved: Technical analysis library for Rust language

Technical analysis library for Rust language. Contribute to austin-starks/ta-rs-improved development by creating an…

github.com

With this new library, you instead define an indicator with a duration. This allows for capturing more accurate nuances of the stock market.

use ta::indicators::ExponentialMovingAverage;
use ta::Next;

// ❌ With the original ta-rs library; we have to input a fixed window
let mut ema = ExponentialMovingAverage::new(3).unwrap();

assert_eq!(ema.next(2.0), 2.0);
assert_eq!(ema.next(5.0), 3.5);
assert_eq!(ema.next(1.0), 2.25);
assert_eq!(ema.next(6.25), 4.25);

// ✔️ With ta-rs-improved; we define durations
let mut ema = ExponentialMovingAverage::new(Duration::seconds(3)).unwrap(); // window size of 3 seconds
let now = Utc::now();

assert_eq!(ema.next((now, 2.0)), 2.0);
assert_eq!(ema.next((now + Duration::seconds(1), 5.0)), 3.5);
assert_eq!(ema.next((now + Duration::seconds(2), 1.0)), 2.25);
assert_eq!(ema.next((now + Duration::seconds(3), 6.25)), 4.25);

Creating Abstractions for Maximum Configurability, Extensibility, and Auditability

While the technical indicator library was useful for defining basic strategies, real world trading strategies are (hopefully) more complex.

What if I wanted to buy only stocks with increasing revenue?
What if I wanted to take advantage of Reddit sentiment?
What if I wanted to rebalance the magnificent 7 based on market cap?

To implement this, I had to really think… how can I represent a trading strategy that can take actions based on anything?

I finally broke it down into the following abstractions.

pub struct Strategy {
    pub condition: Option<Condition>,
    pub action: Action,
}

pub enum Condition {
    Base {
        lhs: Indicator,
        comparison: Comparator,
        rhs: Indicator,
    },
    And(Vec<Condition>),
    Or(Vec<Condition>),
    Multi {
        comparison: Comparator,
        value: i32,
        conditions: Vec<Condition>
    }

}

pub enum Indicator {
    // Simple indicators
    Price {asset: Asset},
    LastOrderPrice {asset: Asset},
    DaysSinceAlert,
    
    // Technical indicators
    SimpleMovingAverage { window: Window, asset: Asset},
    ExponentialMovingAverage { window: Window, asset: Asset },
    RSI { window: Window, asset: Asset},
    BollingerBand { window: Window, std_dev: f64, asset: Asset},
    
    // Broad categories
    Fundamental(FundamentalMetric), // Revenue, PE Ratio, etc.
    Economic(EconomicMetric),       // TreasuryRate, MoneySupply, etc.
    
    // ... 50+ more indicators
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Comparator {
    GreaterThan,
    LessThan,
    Equals,
    // ...
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Action {
    Buy { allocation: Allocation },
    Sell { allocation: Allocation },
    Rebalance { weights: Vec<(Asset, Indicator)> },
    // ...
}

In my system, an indicator is anything that can be represented as a number.

It can be SPY’s price or NVIDIA’s revenue. It can even mean the number of mentions on WallStreetBets in the past hour. It can literally be anything.

If an indicator was the basic building block of a strategy, and condition was the glue that put them to use. In my system, a condition is a Boolean statement.

The flow of a trading strategy works. It contains a condition, which is composed of indicators, and an action. An executed action generates a signal event which can be audited

It can be any true/false statement. SPY’s price is greater than its 30 day average price. The number of mentions of RDDT stock is 1 standard deviation above the average mentions in the past 2 months. If an indicator is any number, then a condition was any statement, whether its true or false.

If a condition evaluates to true, we take the defined action and generate a signal event. An action could be buying a stock, rebalancing a portfolio, or even sending automated alerts.

The signal event that triggers the action is then persisted with metadata including:

The action corresponding to the event
Whether the condition passed
Allocations (such as 10% of buying power for buy actions or target percentages for rebalance actions)

For a rebalance event, the metadata includes the target allocations for the rebalance action

It also contains information on how each condition is evaluated

The combination of conditions and actions are what defines a trading strategy.

This abstraction was damn-near perfect. With it, you could define any trading strategy that you could possibly imagine, using technical, fundamental, or alternative sources of data.

The re-write created a lightning fast, extensible backtesting engine. But seeing what happened in the past is not enough. I wanted to be able to see what would happen right now, with minimal code changes required.

Here’s how I did it.

From a historical simulation to the real-world

In order to re-use my backtesting code for real-world trading, I had to share the event processing architecture so that the trading logic for both was identical.

The idiomatic Rust way to do this is by using traits.

 pub trait EventProcessor {                                                                            
      fn enqueue(&mut self, event: Event, timestamp: DateTime<Utc>);                                    
      fn next(&mut self) -> Event;                                                                      
      fn is_empty(&self) -> bool;                                                                       
  }

A rust trait is like a Java interface. With this simple abstraction, I could write the core trading logic once and reuse it everywhere:

  // Static dispatch = zero-cost abstraction                                                            
  pub fn process_event_loop<E: EventProcessor>(                                                         
      processor: &mut E,                                                                                
      portfolio: &mut Portfolio,                                                                        
      fee_config: &FeeConfig,                                                                           
      market_data: Arc<MarketData>,                                                                     
      tick_state: &TickState,                                                                           
      config: &ProcessingConfig,                                                                        
  ) -> Result<EventLoopResult>

Both BacktestEventProcessor and LiveEventProcessor implement this trait. The trading logic is identical: evaluating conditions, generating signals, creating orders. But the infrastructure underneath is vastly different.

In backtesting, state lives entirely in memory. Orders fill instantly at the current market price (plus simulated slippage). Everything is fast, ephemeral, and purely synchronous.

I did this intentionally because I learned the hard way (by profiling with cargo flamegraphs) that the async code and deserialization in the hot path is a major bottleneck!

The top flame graph shows CPU logic fragmented by I/O when executed inside a shared async runtime. The bottom graph shows the same workload on dedicated worker threads: flat, contiguous CPU execution isolated from the async reactor.

In contrast, for live trading, MongoDB becomes the source of truth. Every tick, the system queries the database for open orders and pending transactions, then checks with the actual brokerage (Alpaca, TradeStation, or Tradier) to see if anything has changed in the real world.

Then there’s reconciliation. Both systems check order statuses and convert changes into events that feed back into the shared event loop. But in backtesting, this is in-memory and mocked. In live trading, it’s async with parallel database queries and real API calls.

Let me walk through a specific example.

A diagram showing how the live-trading system works. It shows the shared backtesting code as well as the differences with live-trading including context assembly & reconciliation and live output execution

Say a buy signal fires at 10:32 AM. In backtesting, the order fills instantly at the 10:32 price plus simulated slippage. We’re done and move on to the next minutely tick.

In live trading, the event and order is saved to a database. A separate process (the OrderQueueWorker ) sends the order to Alpaca. At the same time, the live-trader still ticks, but any future order it generates is automatically rejected because the system detects an open order.

Finally, reconciliation happens. The live-trader won’t turn back on until the order is completely filled. Then we sync the state to the portfolio, and keep detecting market events.

This architecture means I can test a strategy on 10 years of historical data, deploy it to paper trading, then go live. All with zero changes to the trading logic.

The Final End-to-End Backtesting Flow

In total, for my Rust-based backtesting system, the journey of a backtest is as follows:

The journey of a backtest, starting with the initialization in MongoDB to the final persistence in the database

A backtest is added to a MongoDB-based queue
A backtest is claimed, changed to the WARMING_UP state, and submitted to a worker pool
All assets in the backtest are retrieved and historical data including dividends, price, economic data, and fundamentals are persisted to the disk
The strategies are “warmed up”; for example, a 30-day SMA processes data 30-days before the backtest start date
The backtest transitions to the RUNNING state

Then, for each tick, whether it’s OHLC data or minutely data:

The event processor evaluates the conditions and determines if an action should be executed
If a buy or sell event is generated, orders are created based on the signals
The portfolio’s positions and buying power are updated
History is recorded (once per hour for intraday backtests and once per day for daily)
Running statistics (like sharpe and sortino ratio) are calculated

This repeats until the backtest runs from start date to end date. At the end, the events and history are saved to a database.

The end result is a no-code backtesting system.

A lightning-fast backtesting system is technically impressive. But it’s only useful if people actually use it. This poses an interesting question — how do you create a trading platform that anybody can use?

How to configure a no-code trading strategy?

While this article focuses on the backend, a curious reader is left wondering, “how do I use this high-level architecture to build a trading strategy?”

You can do this in many ways. For example, my original open-source trading platform used exclusively form fields.

In NextTrade, trading strategies were configured in verbose form fields in a boring UI

But this was slow and unintuitive. To a trader, they won’t understand the full context of what these fields mean unless they sit down and study it. Nobody wants to study to use a SaaS app.

Thankfully, large language models (LLMs) completely solved this problem.

Aurora is the AI agent in NexusTrade. She does independent research in the market and creates algotrading strategies. Here’s her building strategies based on stock fundamentals

Instead of manually configuring a dozen form fields, LLMs act like a transpiler, converting plain English requirements into the “trading strategy” schema defined above. This allows anybody, even non-technical users, to configure highly complex trading strategies without worrying about the underlying execution engine.

As you can imagine, this is a lot simpler than having an LLM generate code. You just have it generate a configuration, which with the current generation of language models, is absolutely trivial.

A diagram depicting how the LLM works to generate an algorithmic trading strategy from natural language, including the retry loop, model escalation logic, validation, and real-time updates

With this architecture, conditions and actions are independent. They can be generated in parallel by separate LLM calls. If Gemini Flash fails validation, we retry with the error message as feedback. For particularly complex strategies, we escalate to Gemini Pro’s thinking mode. This tiered approach keeps 95%+ of requests fast and cheap while handling edge cases gracefully.

This abstraction is powerful because the LLM is only being used to generate configs. It’s not being asked to create unbounded Python, something with infinite failure modes. It doesn’t need a software engineer at all to interpret or audit. An ordinary investor can get value and create a trading strategy.

The LLM is just being asked to fill in a bounded schema, with N valid indicators, M comparators, and K actions. LLMs are good at this, especially when different system prompts focus on different subtasks. Even if it fails, a simple retry loop almost always fixes these types of config issues — something that you can’t just do easily with generating slow Python code.

Let’s take a much more complex example. We can ask the LLM to generate the following strategy, and it works like a charm.

Create a trading strategy that rebalances the top 50 stocks by market cap every month, but filters to only include stocks with a fundamental ranking score > 3 and a 14-day RSI greater than 50. Sort by market cap descending. This means that fundamentally strong stocks with high market caps are starting to break out.

The generated momentum strategy. it has a 2.02 sharpe ratio, a 3.09 sortino ratio, and a -8.5% max drawdown. See the live-trading performance here

In this example, the process is as follows:

An LLM generates a YAML configuration for the condition, which rebalances monthly
In parallel, an LLM also generates a configuration for the action, which is a complex rebalancing strategy with filters, sorting logic, and limits
The action and condition is validated. This includes validating assets against a database of valid assets, and ensuring the output adheres to the schema
They are combined to form the strategy for the portfolio
The final portfolio is saved to the database

The end result is a trading strategy that can be backtested, optimized, and deployed with a single button.

Concluding Thoughts

There is some additional complexity that I glossed over. Genetic optimizations have a dedicated worker pool. Intraday backtests caches data directly on disk with a custom binary that enables zero-copy deserialization, highly efficient sequential reads, and O(log n) random access.

  // Cache-aligned struct for zero-copy deserialization                                                                                          
  #[repr(C, align(32))]                                                                                                                          
  #[derive(Clone, Copy, Pod, Zeroable)]                                                                                                          
  pub struct OnDiskPricePoint {                                                                                                                  
      pub timestamp_nanos: i64,                                                                                                                  
      pub asset_id: u32,                                                                                                                         
      pub price: f32,                                                                                                                            
      pub open: f32,                                                                                                                             
      pub high: f32,                                                                                                                             
      pub low: f32,                                                                                                                              
      pub volume: f32,                                                                                                                           
  }                                                                                                                                              
                                                                                                                                                 
  // Index entry enables O(log n) timestamp lookups                                                                                              
  #[repr(C)]                                                                                                                                     
  #[derive(Clone, Copy, Pod, Zeroable)]                                                                                                          
  pub struct IndexEntry {                                                                                                                        
      pub timestamp_nanos: i64,                                                                                                                  
      pub point_index: u64,  // Offset into data file                                                                                            
  }                                                                                                                                              
                                                                                                                                                 
  // O(log n) random access via binary search + zero-copy read                                                                                   
  pub fn read_points_at_timestamp(&self, timestamp_nanos: i64) -> Result<Vec<T>> {                                                               
      // Binary search the in-memory index - O(log n)                                                                                            
      let idx = self.index.binary_search_by_key(&timestamp_nanos, |e| e.timestamp_nanos)?;                                                       
                                                                                                                                                 
      // Seek directly to the byte offset                                                                                                        
      let byte_offset = header_size + (self.index[idx].point_index * point_size);                                                                
      file.seek(SeekFrom::Start(byte_offset))?;                                                                                                  
                                                                                                                                                 
      // Zero-copy deserialization via bytemuck                                                                                                  
      let mut buffer = vec![0u8; bytes_to_read];                                                                                                 
      file.read_exact(&mut buffer)?;                                                                                                             
      buffer.chunks_exact(point_size)                                                                                                            
          .map(|chunk| bytemuck::pod_read_unaligned(chunk))                                                                                      
          .collect()                                                                                                                             
  }

But this is the high-level overview of an institutional-grade backtesting system.

Five years ago, I decided to take a leap of faith and build a system to test my crazy trading ideas without waiting 30 seconds per backtest. Now, the system uses AI to launch thousands of backtests simultaneously for users who’ve never seen a line of Rust. You can now build the same system yourself, or skip the 5-years of iteration and create your first trading strategy today.

NexusTrade — Medium Users Get 1 Month Free

Exclusive offer for Medium readers: Get 1 month of NexusTrade Premium free. Build AI trading strategies, backtest with…

nexustrade.io

Austin Starks is a Software Engineer at Coinbase and the creator of NexusTrade. NexusTrade is a no-code algorithmic trading platform designed for retail investors who are willing to sit down and learn how to be a better, smarter, data-driven investor and trader.