Skip to content

AI Context Windows: Why Your Chatbot Forgets Everything

· 7 min read

You told Claude your name. You explained your project. You shared three paragraphs of background. The conversation was flowing. Then, twenty messages later, it asked you to explain your project again. From scratch.

You didn’t do anything wrong. You hit the edge of something called an AI context window, and understanding how it works will change the way you use every AI tool you touch.

Most people never think about context windows. They just notice the symptoms: the AI forgetting things, repeating itself, or giving answers that ignore what you said ten minutes ago. Once you understand what’s actually happening, you can work with it instead of fighting it.

What an AI context window actually is

Think of it as a desk. Not a filing cabinet. Not a library. A desk.

Everything the AI needs to work with has to fit on that desk. Your messages. Its replies. Any instructions running in the background. The document you pasted in. All of it, sitting there at once.

When the desk fills up, something has to go. The AI doesn’t get a bigger desk. It starts pushing older papers off the edge to make room for new ones. Sometimes it summarises what was there before. Sometimes it just loses it.

That’s the context window. It’s the total amount of text the AI can hold in its working memory at any one time.

The technical unit is ‘tokens’ rather than words. One token is roughly three quarters of a word, so 200,000 tokens is about 150,000 words. But you don’t need to think in tokens to use AI well. You just need to know: the space is finite, and everything you send takes up room.

How big are context windows right now?

The numbers have grown fast. When ChatGPT launched in late 2022, it could handle about 1,500 words at a time. Today, the major models offer dramatically more.

Claude’s latest models offer 200,000 tokens as standard, with up to a million available for some users. GPT-4.1 supports a million tokens. Google’s Gemini models can process up to two million.

Those are big numbers. Two million tokens is roughly 1.5 million words, or about 3,000 pages. You could feed it an entire book series and still have room for questions.

But here’s what nobody talks about enough.

Bigger is not always better

The race for bigger context windows makes for impressive headlines. It also misses the point for most people.

If you’re using AI to help draft emails, brainstorm ideas, or work through a problem, you are nowhere near the limit. A typical back-and-forth conversation uses a fraction of even a modest context window. You could chat with Claude all afternoon and barely scratch 200,000 tokens.

The real issue is not size. It’s what fills the space.

Research from Anthropic’s engineering team confirms something practical users notice quickly: models perform best when the context is focused and relevant. Dumping everything into the window because you can is like covering your desk with every document you own and expecting to find the one you need. More information does not automatically mean better answers.

I learned this the hard way. I had a Claude Code memory file that was eating 20% of every session before I even typed a word. The AI wasn’t getting dumber. I was wasting its attention.

There’s also a gap between the advertised context window and the effective one. As IBM’s explainer on context windows notes, models can struggle to recall information buried in the middle of a long context. The beginning and the end get the most attention. The middle fades. Researchers call this the ‘lost in the middle’ problem, and while newer models handle it better, it hasn’t disappeared entirely.

So a model might advertise a million tokens. But the quality of its recall at token 500,000 is not the same as at token 5,000.

What actually eats your context window

Most people assume their messages are the main thing filling the window. They’re often not.

Here’s what takes up space, roughly in order:

System instructions. Every AI tool runs background instructions that tell it how to behave. You never see these, but they’re sitting on the desk before you even say hello.

Your custom instructions. If you’ve set up custom instructions in ChatGPT or a system prompt in Claude, those load into every single conversation. Useful, but they cost space.

Conversation history. Every message you send and every reply the AI gives stays in the window. A long conversation accumulates fast.

Pasted content. That 10-page document you dropped in? It’s all on the desk now. Along with everything else.

The AI doesn’t pick and choose. It holds everything until the window fills up. Then it starts making compromises.

What happens when you run out

Different tools handle this differently, but the outcome is similar.

Some models summarise the earlier parts of the conversation to free up space. Claude does this automatically on paid plans. The summary preserves the broad strokes but loses specific details. If you gave precise instructions early on, those details might get compressed into something vaguer.

Other models simply drop older content. The most recent messages stay. The earliest ones vanish.

Either way, the AI is now working with less than what you gave it. It’s not being lazy or ignoring you. It physically cannot see what fell off the desk.

This is why long conversations gradually lose coherence. The AI isn’t getting worse. It’s getting more forgetful, because the context window is doing its best to keep up.

How to use your context window well

You don’t need a million tokens. You need to be intentional about the ones you use. This is the heart of what’s now called context engineering: the skill of controlling what the AI sees and when.

Here are the practical habits that make the biggest difference.

Start fresh for new topics. Don’t continue a conversation about your CV when you want to pivot to meal planning. Open a new chat. Give the AI a clean desk.

Front-load what matters. Put the most important information at the beginning of your message, not buried in a long preamble. The AI pays closest attention to what comes first and last.

Be specific, not exhaustive. Instead of pasting an entire report and asking ‘what do you think?’, paste the relevant section and ask a focused question. Less noise means better signal.

Restate key details in long conversations. If you’re twenty messages deep and the AI seems to have forgotten something important, say it again. Don’t assume it still has it. A quick reminder is cheaper than a confused answer.

Keep custom instructions lean. Those background instructions load every time. If yours are three paragraphs long, consider trimming them to the essentials. Every word counts.

The real skill is not having more space

The AI industry is obsessed with making context windows bigger. That’s useful for developers building complex systems and for specific tasks like analysing entire codebases. Vibe coding relies on this same principle — the AI needs to hold your entire project in view as it turns your plain-language descriptions into working code, and once it can’t, things start breaking. If you use Claude Code, git worktrees in Claude Code let you give each AI session its own isolated context and workspace, so parallel agents never pollute each other’s windows. The same processing constraints also shape consumer-facing features like AI Overviews in Google Search, where the AI has to summarise multiple web pages within its context limits before presenting you with a single answer.

But for everyday use, the skill that matters is not having the biggest window. It’s knowing how to use the one you have.

The best conversations I have with AI are not the longest ones. They’re the ones where I give it exactly what it needs, nothing it doesn’t, and keep the context focused on the task at hand.

Stop treating the context window like infinite memory. Start treating it like a clean, well-organised desk. Your AI will work better for it.