The Ridiculous Difference 339 Days Make in the World of AI

AI Years Are Faster Than Dog Years

8 min read

339 days ago, the world was stunned by Claude 3.5 Sonnet.

At the time, it was the best model the world had ever seen. While being as comparable in price to OpenAI’s best models, it delivered superior performance in the vast majority of tasks, especially coding, reasoning, and agentic tasks.

Fast-forward to today, and there are now a half-dozen models that are faster, cheaper, and more powerful.

And for some reason, nobody gives a flying fuck.

The Problem with AI Advancement

In every other time in history, most hype cycles have been followed by disappointment because the technology didn’t live up to the expectations.

Take cryptocurrencies for example. The vast majority of people did not truly understand cryptocurrency and blockchain technology. They just believed that "it was the future".

We can say the exact same thing about AI.

Most laypeople simple don’t understand it, and think that artificial intelligence is following the same trajectory because Siri hasn’t improved much or Reddit is flooded with AI-generated slop that’s easy to detect.

They don’t understand the extreme extent which AI has improved.

As someone who works with AI every single day, I’ve witnessed the exponential acceleration of AI first-hand. Yet, people who aren’t in the field truly don’t understand the insane amount of progress we’ve made with not only mainstream, closed-source models, but obscure open-source ones as well.

As an example, let’s take a side-by-side comparison of the best model of 2024 with the best model of 2025.

A Side-By-Side Comparison of Claude 3.5 Sonnet and OpenAI o3-mini

For this article, I’m going to compare the best model of 2024, Claude 3.5 Sonnet, with the best model of 2025, OpenAI o3-mini. We’re going to compare the models on the basis of:

Cost: How much does the model cost to use?
Accuracy: How accurate are the final results?

Let’s start with cost.

How much is Claude 3.5 Sonnet?

Claude 3.5 Sonnet costs $3.75 per million input tokens and $15 per million output tokens. By today’s standards, this is actually fairly expensive. In comparison, the most powerful 2025 models are far cheaper.

How much is OpenAI GPT o3-mini?

In comparison, the OpenAI GPT o3-mini model is $1.10 per million input tokens and $4.40 per million output tokens. This makes it 3x cheaper than last year’s most powerful model.

But being less expensive is not enough. Which model is the most powerful?

Comparing these models in terms of raw power

To compare these, we’re going to see how these models perform with two different complex reasoning tasks

JSON generation: how well these models generate complex syntactically-valid JSON objects
SQL generation: how well these models generate complex, valid SQL queries

Let’s start with JSON generation.

What type of JSON object are we creating?

For this test, we’re not going to generate a simple JSON object. We’re going to do a deeply-nested, highly complex one.

Specifically, we’re going to perform the task of creating algorithmic trading strategies.

To do this, we’ll use the NexusTrade platform. NexusTrade is a platform that enables retail investors to create no-code algorithmic trading strategies.

NexusTrade - No-Code Automated Trading and Research

Perform financial research and deploy algorithmic trading strategies

nexustrade.io

When creating a portfolio of strategies, it undergoes the following process:

The process of creating a portfolio of trading strategies

Create the portfolio outline, including its name, initial value, and a description of the strategies.
Create a strategy outline. This includes a name, an action (“buy” or “sell”), the asset we want to buy, an amount (for example 10% of your buying power or 100 shares), and a description of when we want to perform the action.
Create a condition outline, which is an object that describes if we should execute the action at the current timestamp

The one request goes through each of these prompts one after the other. The end result is an algorithmic trading strategy that we can deploy right to the market!

Let’s see how well Claude Sonnet and o3-mini can create a complex strategy.

Analyzing Claude 3.5 Sonnet in JSON Generation

In this example, I am going to create the following trading rules:

Create a trading strategy that buys 50% of my buying power in Bitcoin when the rate of change of Bitcoin is 1 standard deviation above its 3 day rate of change or my bitcoin position is down 15%. Sell 20% of my portfolio value if I am up 25% or I’m up 10% from the last time I sold it.

This was the response.

In this example, Claude generated a syntactically-valid algorithmic trading strategy on its first try. Nice! But let’s dive deeper.

If we click on the strategy code and check out the rules we created, we noticed that Claude made a mistake!

It’s subtle, but if we look at the buy rule, we can notice an issue.

Instead of doing 1 standard deviation above its 3 day rate of change, it did 1 + the 3 day rate of change.

This is not what we asked for. Can GPT o3-mini do better?

Analyzing OpenAI o3-mini in JSON Generation

Let’s look at the strategies created with the O3-mini model.

If we zoom in on each model, we see that the OpenAI model created our strategies exactly!

To do so at a cheaper price is insane.

However, creating JSON objects is just one complex reasoning task. Let’s try another.

Creating a SQL Query with these large language models

In our next task, we’re going to see how well these models can create accurate SQL queries that conforms to the user’s input. This process is similarly complex.

The process for creating a SQL query is as follows:

The user sends a request to the LLM
The LLM generates a SQL query that conforms to the request
The SQL query is executed against the database to get results
The input, SQL query, and the results are sent to a “Grader” LLM
If the grade is too low, the grader LLM gives feedback, which is input into a retry LLM, and the process is repeated up to 5 times
When the grade is high enough, we format the results with a “Formatter” LLM
We send the formatted results to the user

While evaluating each step of the process is informative, the part that matters most critically is the final output. Let’s see how 2024’s starling darling compares to the 2025 model of the year.

Analyzing Claude 3.5 Sonnet in SQL Query Generation

Here’s the request I sent to the model.

What non-tech stocks increased their revenue, free cash flow, and net income every quarter for the past 4 quarters and every year for the past 2 years? Sort by market cap descending

This was its response.

The response generated by Claude 3.5 Sonnet

In the end, it gave a rather large list of stocks. I copy/pasted the query and results into ChatGPT, and this was its response.

ChatGPT rated the query a 0.7! There were some minor issues with the query, but it doesn’t strictly conform to the requirements. It’s not bad, but can OpenAI do better?

Analyzing OpenAI o3-mini in SQL Query Generation

I performed the same test using OpenAI o3-mini.

The response generated by OpenAI o3-mini

OpenAI only found one stock that conformed to this query. Let’s see if its accurate.

The grade that OpenAI gave the OpenAI query

OpenAI got a perfect score! It exactly conformed to the query’s requirements!

From my anecdotal experience, I’m not surprised. This test confirmed what I already knew – that this new generation of models are far superior.

Summarizing the results

To summarize these results:

OpenAI o3-mini is a lot cheaper than Claude 3.5 Sonnet for both the inputs and the outputs
Despite being cheaper, OpenAI o3-mini performs significantly better when it comes to generating complex JSON objects
Additionally, OpenAI o3-mini performs much better when it comes to generating accurate SQL queries

If you think we’ve hit a wall with AI progress, you’re clearly not paying attention.

Concluding Thoughts

People think that we’ve hit a plateau when it comes to AI models. Objectively, this couldn’t be further from the truth.

The new age of large language models are both cheaper and more powerful than models even created last year. Because of this, my AI-Powered trading platform NexusTrade has never been better.

NexusTrade - No-Code Automated Trading and Research

Perform financial research and deploy algorithmic trading strategies

nexustrade.io

NexusTrade enables retail investors to create algorithmic trading strategies and perform advanced financial research. Thanks to the increase in capabilities of these models, the platform is now cheaper and more accurate, and can enable anybody to become a data-driven, successful investor.

Want to see the difference NexusTrade makes with your investing? Create a free account today! If you decide to become a premium member (that comes with nearly unlimited access to GPT o3-mini), you can save up to $1,000 by clicking this link and claiming your discount.

NexusTrade - No-Code Automated Trading and Research

Perform financial research and deploy algorithmic trading strategies

nexustrade.io

Getting started is free. Join 18,000 others in making smarter, automated, investing decisions.

The Ridiculous Difference 339 Days Make in the World of AI

AI Years Are Faster Than Dog Years

The Problem with AI Advancement

A Side-By-Side Comparison of Claude 3.5 Sonnet and OpenAI o3-mini

How much is Claude 3.5 Sonnet?

How much is OpenAI GPT o3-mini?

Comparing these models in terms of raw power

What type of JSON object are we creating?

NexusTrade - No-Code Automated Trading and Research

Perform financial research and deploy algorithmic trading strategies

Analyzing Claude 3.5 Sonnet in JSON Generation

Analyzing OpenAI o3-mini in JSON Generation

Creating a SQL Query with these large language models

Analyzing Claude 3.5 Sonnet in SQL Query Generation

Analyzing OpenAI o3-mini in SQL Query Generation

Summarizing the results

Concluding Thoughts

NexusTrade - No-Code Automated Trading and Research

Perform financial research and deploy algorithmic trading strategies

NexusTrade - No-Code Automated Trading and Research

Perform financial research and deploy algorithmic trading strategies

Discussion