The Ridiculous Difference 339 Days Make in the World of AI
AI Years Are Faster Than Dog Years
339 days ago, the world was stunned by Claude 3.5 Sonnet.
At the time, it was the best model the world had ever seen. While being as comparable in price to OpenAI’s best models, it delivered superior performance in the vast majority of tasks, especially coding, reasoning, and agentic tasks.
Fast-forward to today, and there are now a half-dozen models that are faster, cheaper, and more powerful.
And for some reason, nobody gives a flying fuck.
The Problem with AI Advancement
In every other time in history, most hype cycles have been followed by disappointment because the technology didn’t live up to the expectations.
Take cryptocurrencies for example. The vast majority of people did not truly understand cryptocurrency and blockchain technology. They just believed that "it was the future".
We can say the exact same thing about AI.
Most laypeople simple don’t understand it, and think that artificial intelligence is following the same trajectory because Siri hasn’t improved much or Reddit is flooded with AI-generated slop that’s easy to detect.
They don’t understand the extreme extent which AI has improved.
As someone who works with AI every single day, I’ve witnessed the exponential acceleration of AI first-hand. Yet, people who aren’t in the field truly don’t understand the insane amount of progress we’ve made with not only mainstream, closed-source models, but obscure open-source ones as well.
As an example, let’s take a side-by-side comparison of the best model of 2024 with the best model of 2025.
A Side-By-Side Comparison of Claude 3.5 Sonnet and OpenAI o3-mini
For this article, I’m going to compare the best model of 2024, Claude 3.5 Sonnet, with the best model of 2025, OpenAI o3-mini. We’re going to compare the models on the basis of:
- Cost: How much does the model cost to use?
- Accuracy: How accurate are the final results?
Let’s start with cost.
How much is Claude 3.5 Sonnet?
The cost of Claude 3.5 SonnetClaude 3.5 Sonnet costs $3.75 per million input tokens and $15 per million output tokens. By today’s standards, this is actually fairly expensive. In comparison, the most powerful 2025 models are far cheaper.
How much is OpenAI GPT o3-mini?
The cost of GPT o3-miniIn comparison, the OpenAI GPT o3-mini model is $1.10 per million input tokens and $4.40 per million output tokens. This makes it 3x cheaper than last year’s most powerful model.
But being less expensive is not enough. Which model is the most powerful?
Comparing these models in terms of raw power
To compare these, we’re going to see how these models perform with two different complex reasoning tasks
- JSON generation: how well these models generate complex syntactically-valid JSON objects
- SQL generation: how well these models generate complex, valid SQL queries
Let’s start with JSON generation.
What type of JSON object are we creating?
For this test, we’re not going to generate a simple JSON object. We’re going to do a deeply-nested, highly complex one.
Specifically, we’re going to perform the task of creating algorithmic trading strategies.
To do this, we’ll use the NexusTrade platform. NexusTrade is a platform that enables retail investors to create no-code algorithmic trading strategies.
When creating a portfolio of strategies, it undergoes the following process:
The process of creating a portfolio of trading strategies- Create the portfolio outline, including its name, initial value, and a description of the strategies.
- Create a strategy outline. This includes a name, an action (“buy” or “sell”), the asset we want to buy, an amount (for example 10% of your buying power or 100 shares), and a description of when we want to perform the action.
- Create a condition outline, which is an object that describes if we should execute the action at the current timestamp
The one request goes through each of these prompts one after the other. The end result is an algorithmic trading strategy that we can deploy right to the market!
Let’s see how well Claude Sonnet and o3-mini can create a complex strategy.
Analyzing Claude 3.5 Sonnet in JSON Generation
In this example, I am going to create the following trading rules:
Create a trading strategy that buys 50% of my buying power in Bitcoin when the rate of change of Bitcoin is 1 standard deviation above its 3 day rate of change or my bitcoin position is down 15%. Sell 20% of my portfolio value if I am up 25% or I’m up 10% from the last time I sold it.
This was the response.
The response from Claude 3.5 SonnetIn this example, Claude generated a syntactically-valid algorithmic trading strategy on its first try. Nice! But let’s dive deeper.
If we click on the strategy code and check out the rules we created, we noticed that Claude made a mistake!
The strategies generated by the modelIt’s subtle, but if we look at the buy rule, we can notice an issue.
The Buying Strategy Generated by ClaudeInstead of doing 1 standard deviation above its 3 day rate of change, it did 1 + the 3 day rate of change.
This is not what we asked for. Can GPT o3-mini do better?
Analyzing OpenAI o3-mini in JSON Generation
The response from OpenAI o3-miniLet’s look at the strategies created with the O3-mini model.
The strategies created by the modelIf we zoom in on each model, we see that the OpenAI model created our strategies exactly!
The buy strategy created by the modelTo do so at a cheaper price is insane.
However, creating JSON objects is just one complex reasoning task. Let’s try another.
Creating a SQL Query with these large language models
In our next task, we’re going to see how well these models can create accurate SQL queries that conforms to the user’s input. This process is similarly complex.
The process of generating the SQL queryThe process for creating a SQL query is as follows:
- The user sends a request to the LLM
- The LLM generates a SQL query that conforms to the request
- The SQL query is executed against the database to get results
- The input, SQL query, and the results are sent to a “Grader” LLM
- If the grade is too low, the grader LLM gives feedback, which is input into a retry LLM, and the process is repeated up to 5 times
- When the grade is high enough, we format the results with a “Formatter” LLM
- We send the formatted results to the user
While evaluating each step of the process is informative, the part that matters most critically is the final output. Let’s see how 2024’s starling darling compares to the 2025 model of the year.
Analyzing Claude 3.5 Sonnet in SQL Query Generation
Here’s the request I sent to the model.
What non-tech stocks increased their revenue, free cash flow, and net income every quarter for the past 4 quarters and every year for the past 2 years? Sort by market cap descending
This was its response.
The response generated by Claude 3.5 SonnetIn the end, it gave a rather large list of stocks. I copy/pasted the query and results into ChatGPT, and this was its response.
Claude 3.5 Sonnet got a score of 0.7ChatGPT rated the query a 0.7! There were some minor issues with the query, but it doesn’t strictly conform to the requirements. It’s not bad, but can OpenAI do better?
Analyzing OpenAI o3-mini in SQL Query Generation
I performed the same test using OpenAI o3-mini.
The response generated by OpenAI o3-miniOpenAI only found one stock that conformed to this query. Let’s see if its accurate.
The grade that OpenAI gave the OpenAI queryOpenAI got a perfect score! It exactly conformed to the query’s requirements!
From my anecdotal experience, I’m not surprised. This test confirmed what I already knew – that this new generation of models are far superior.
Summarizing the results
To summarize these results:
- OpenAI o3-mini is a lot cheaper than Claude 3.5 Sonnet for both the inputs and the outputs
- Despite being cheaper, OpenAI o3-mini performs significantly better when it comes to generating complex JSON objects
- Additionally, OpenAI o3-mini performs much better when it comes to generating accurate SQL queries
If you think we’ve hit a wall with AI progress, you’re clearly not paying attention.
Concluding Thoughts
People think that we’ve hit a plateau when it comes to AI models. Objectively, this couldn’t be further from the truth.
The new age of large language models are both cheaper and more powerful than models even created last year. Because of this, my AI-Powered trading platform NexusTrade has never been better.
NexusTrade enables retail investors to create algorithmic trading strategies and perform advanced financial research. Thanks to the increase in capabilities of these models, the platform is now cheaper and more accurate, and can enable anybody to become a data-driven, successful investor.
Want to see the difference NexusTrade makes with your investing? Create a free account today! If you decide to become a premium member (that comes with nearly unlimited access to GPT o3-mini), you can save up to $1,000 by clicking this link and claiming your discount.
Getting started is free. Join 18,000 others in making smarter, automated, investing decisions.