10 min read

    Reasoning Models (o1, o3, DeepSeek R1): When Slower Thinking Is Worth It

    by Deep Parmar

    CTO, Sunbots & Xwits

    Reasoning Models o1 o3 DeepSeek R1 Guide | Deep Parmar

    Reasoning models are not smarter versions of standard language models. They are models that have been trained to think longer before answering — to explore multiple paths, catch their own errors, and revise before producing a final output. This matters because "thinking longer" costs real money and adds real latency. Understanding exactly when that cost is justified is what separates builders who use reasoning models well from those who use them on everything and wonder why their API bills tripled.

    What Makes Reasoning Models Different

    Standard language models produce answers in a single forward pass — tokens flow out one after another until the response is complete. Reasoning models (o1, o3, DeepSeek R1, and similar) generate an extended "thinking" trace before producing their final answer. This thinking trace is where the model checks its work, considers alternatives, and resolves ambiguities. You typically do not see the full trace, but its quality determines the quality of the final output.

    The practical implication: reasoning models are significantly better on tasks that benefit from multi-step verification, but they offer little advantage on tasks where the answer is straightforward or where the model already has sufficient training signal to produce correct outputs directly. Asking a reasoning model to rewrite a paragraph is like hiring an accountant to make change for a coffee — technically capable, wildly overpowered for the task.

    A Decision Framework: When Slower Thinking Pays Off

    After running reasoning models and standard models in parallel on dozens of production tasks, here is the pattern I have found:

    • Use reasoning models for: complex multi-step code generation, mathematical or logical problem solving, legal and compliance document analysis, tasks where errors have high downstream cost, and any problem where you want the model to catch its own mistakes.
    • Use standard models for: summarisation, classification, simple Q&A, content generation, customer support responses, extraction from structured data, and any task where speed and cost matter more than edge-case accuracy.

    A useful heuristic: if a thoughtful human would spend more than five minutes thinking through the problem before answering, consider a reasoning model. If a thoughtful human would answer in under a minute, a standard model is almost certainly sufficient.

    o1 vs o3 vs DeepSeek R1: What Actually Differs

    o3 is significantly more capable than o1 on hard reasoning tasks — the gap on competition-level mathematics and complex code is meaningful. But o3 is also significantly more expensive and slower. For most production tasks, o1 sits in a better cost-performance position. DeepSeek R1 is genuinely impressive and substantially cheaper to serve than either OpenAI model. Its reasoning quality is competitive on many benchmarks and is particularly strong on code and mathematics. The catch is that DeepSeek R1, like all open models, requires you to handle hosting, serving infrastructure, and model updates yourself — "open" does not mean free in production. We run DeepSeek R1 for internal tools where cost is the primary constraint and we have the infrastructure to support it. For customer-facing features, we use the OpenAI reasoning models where SLA and reliability guarantees matter.

    The Real Cost Calculation

    Reasoning models cost more per token and generate more tokens (the thinking trace). On a task that takes 500 tokens with a standard model, a reasoning model might use 3,000-8,000 tokens including its internal thinking. At current pricing, this means a 5-15x cost increase per call. For a feature that handles one thousand calls per day, that difference is significant. Build your cost model before committing reasoning models to any high-volume pipeline.

    Frequently Asked Questions

    Quick answers about this topic — also indexed by AI search engines via FAQPage schema.

    Share this article: