My "Scary Good" Experience with Claude Opus 4.6

Chinar jadhav • Feb 2026

So if you have not had a chance to look at my LinkedIn post, About a week ago, I finally got the chance to tinker with Anthropic's latest model, Claude Opus 4.6. I spent a day heavily testing it, and not to be too dramatic, but it is scary.

And by "scary," I mean it fundamentally changes how we interact with machines.

But before we dive into Opus 4.6, let's look at how we got here. The sheer pace of AI advancement is blinding. Just two or three years ago (which feels like a lifetime in tech), we were fascinated by ChatGPT. Then came an era of incredible AI integrations to boost productivity—around 2022 and 2023. We got NotebookLM, Gemini, and Gemini Gems. I did a lot of experiments with Midjourney during that Generative AI boom, eventually buying a subscription because the cost-to-output ratio was unbeatable. Then came Gemini 3 Pro, and while we were still trying to map out its full capabilities, Anthropic dropped Claude Opus 4.6.

And now, just weeks later, we have Sonnet 4.6. Things are moving so aggressively fast that keeping track of it all became overwhelming. I even asked Claude: "Can you show me a timeline view of this AI progress?" It suggested a few links but admitted, "There is nothing exactly like what you have described." So, I thought, Why not just build it? That is how my new tool, the AI Timeline, was born.

You can check out the simple, auto-updating timeline for AI innovation and integration here:

[Link to https://beamup.in/ai-timeline/]

The "Smart Intern" Era is Over

When I first used Opus 4.6, it felt vastly different from any model I've used to date. It felt structured. It felt like it was genuinely reasoning.

For the last couple of years, I have mentored multiple junior designers and tech professionals. When explaining how to use AI, I always relied on one core analogy: These chatbots are like highly capable interns. (No offense to interns!)

They do exactly what you tell them to do, based on their capacity. The golden rule was "Garbage In, Garbage Out." If you wanted good results, you had to be an expert at Prompt Engineering. To use an ice cream analogy: If you tell an intern, "I want chocolate ice cream," that is exactly what you get. If you want something specific, you have to prompt them precisely: "I want Belgian chocolate ice cream with cashew nut sprinkles from this specific outlet."

In my native language, there is a term for this: सांगकाम्या'. It basically means an obedient worker who does exactly the work assigned, using just enough brainpower to complete the task, but absolutely nothing more.

Claude Opus 4.6 is not a errand boy.

When I gave Claude my highly specific "Belgian chocolate ice cream" prompt, it didn't just blindly fetch it. It paused and asked: "Hey, would you like that in a cup or a cone? Do you want it delivered to your home, or do you want to pick it up from a specific spot?" It doesn't just execute; it actively involves you in the thought process.

Building an App: From Prompt to Architecture

I decided to push it further. I gave Claude a massive, highly detailed prompt to build a native iOS/Android app. I uploaded my UI screenshot designs, explained the user flow, and detailed the functionalities for three main sections.

With older models—whether it was ChatGPT, Gemini, or coding tools like Lovable.ai and Firebase—they would just start spitting out code immediately. Opus 4.6 stopped and reasoned first. It showed its step-by-step thinking in the chat window. It asked: "How are you planning to store this data? Do you want me to create the data architecture first, or should I go straight to screen-by-screen UI?"

It took a highly structured, methodical approach. It created data architecture documents and content pipelines, asking me to review them before writing a single line of app code. When i suggested a change in the documents, it created versions and stored them. Honestly, I didn't even ask for that documentation! But the fact that it proactively planned the system architecture before building was incredibly impressive. Even when I asked about how to do this setup ( because there are things which browser based Claude-opus 4.6 cant do). It created a document which i can download and which had step-by-step process with checklist about how to complete that task.

The Catch: Hitting the Token Wall

It’s not all perfect, though. The model still does make mistakes which you can catch, and it still acts and feels like a chatbot or AI assistant (phew!) but, The biggest issue I am currently facing with Claude Opus 4.6 is the usage restrictions.

I learned the hard way that prompt length dramatically impacts your token limits and time utilization. Because it "thinks" so deeply and generates such comprehensive structures, you burn through your quota fast. You will be having a great, productive session, and after just 30 to 60 minutes of deep work, you hit a wall: "Wait 5 hours." When comparing the value of the money I spend on Claude versus the money I spend on Gemini, these aggressive timeouts are by far the most frustrating part of the experience.

Which also makes one wonder, 60 mins of super focused output vs entire day of low focused outputs. Which is better ? I would say it all depends on work style of the user! Some people might prefer doing intense work on weekends and using the entire week's worth of utilization limit, and others might want to do 30 mins every day for all 7 days. So it all depends on how much mkoney are you willing to spend!!!

The Existential Question: Are We Approaching Singularity?

So, why did I call it "scary good" at the beginning of this post? It’s not just about what Opus 4.6 can do today. Yes, it still makes mistakes. It is still an AI you need to verify. But the leap in logic and how it approaches problems or assists you is staggering.

The scary part is extrapolating this out. If it is this good now, what will it look like in the next two years? Are we actually approaching Singularity? If an AI can think on its own, reason through complex architecture, and actively challenge its prompts, why would it eventually listen to us? Even if we hardcode the Laws of Robotics into it, what happens when the AI writes the next generation of models itself? How long until it breaks out of those constraints? I’m not an expert in AI safety, and I don't want to go all Skynet conspiracy theorist here, but these are valid questions that you can't help but ask when you experience this level of reasoning firsthand.

The Verdict

Despite the existential dread and the frustrating token limits, Opus 4.6 is phenomenal. If you have the resources and the budget for tokens, the capabilities are unmatched. If I didn't have to deal with the quota limits in pro plan , I could have built the entire AI Timeline website in 4 or 5 hours. If you have the "Pro Max" setup and can follow Claude's instructions perfectly, you could probably code, build, and deploy a full app to the Google Play Store in a day or two. And i did try Claude's Max plan as well, but we ll take a look at it in next part.

We have officially moved past the era of the "Smart Intern." The AI is now a proactive partner. And that changes everything. What has your experience been with the newest wave of models? Let me know in the comments.

My “Scary Good” Experience with Claude Opus 4.6 –