What Is Context Engineering and How to Apply It in Real Systems
An AI code assistant generates a function to handle user uploads. The code looks correct, compiles, and passes all tests. You merge it. Two days later, a high-priority bug comes in. Files from premium plan users are being processed by the standard, slower queue. The generated code called the generic function `enqueue_job()` because it had no idea that a utility `priority_enqueue_job()` existed for specific user roles. The code was correct, but wrong for the system.
This is the ceiling that most teams hit with LLMs. The model’s reasoning ability is good, but it is almost completely unaware of your system’s operational reality. Fixing this requires a system-level approach. This is context engineering: the process of selecting and structuring the right information to give to the model, so the output is not just plausible, but correct within your specific environment.
The cost of disconnected models
When a model operates with incomplete context, it produces outputs that are plausible, but wrong. These errors are tricky because they pass static analysis and even basic unit tests. They are integration and logic bugs that reveal a gap between the model’s world and the reality of the codebase.
I have seen this happen in a few ways. A model suggests adding `axios` to a Node.js service, without knowing there is already an internal, hardened HTTP client with tracing and built-in error handling that is required for all network calls. An LLM refactors a method in Python to gain efficiency, changing the data type of a rarely used return value from `list` to `generator`. The local module’s unit tests pass, but a downstream service that consumes this output now fails at runtime because it expects to be able to call `len()` on the result. Or a model writes a database query that works perfectly in isolation, but omits a required `WHERE tenant_id = ?` clause because it is not aware of the system’s multi-tenant architecture.
In each case, a developer needs to manually step in, figure out what the model missed, and rerun the task with the missing information. This manual process of re-engineering context for each request is a productivity tax that does not scale. It is what turns a “10x” tool into a 1.1x tool with a high maintenance cost.

What is context engineering
Prompt engineering focuses on refining the instructions given to a model, on properly formulating the command. Context engineering is about providing the operational information required for the model to execute that command correctly. A prompt engineer works on the `System:` message. A context engineer builds the data pipelines that populate the `User:` message with everything the model needs to know.
This goes beyond simply throwing data into the prompt. It is about identifying the critical pieces of information that define the operational boundaries of a task. This information should be treated as a first-class architectural concern, not a last-minute detail.
Main operational context boundaries include:
- Relevant data points: Instead of the entire database schema, provide the schemas of the tables related to the user’s request. Instead of all API endpoints, deliver the OpenAPI specs of the services involved.
- User interaction history: What did the user just do? What error did they just see? What is their role and what permissions does that grant?
- System state variables: Current feature flags, API rate limits of a dependent service, or the load on a database replica. This information is volatile and exists outside the codebase.
Treating context as a detail is why so many AI integrations feel fragile. Treating it as part of the system architecture is how you make those integrations reliable.
How to manage context
A systematic approach to context means identifying what matters, designing how to deliver it, and keeping the information up to date.
First, you need to map the categories of information a model needs to execute a task correctly. User context includes ID, permissions, preferences, and recent activity. Is the user an admin or a standard user? Are they on a free or enterprise plan? This often defines which business rules apply. Domain context is your business logic and constraints, like “Orders above $10,000 require a manual approval step”. Operational context is the real-time state of your system, like API rate limits or active feature flags. This is the most dynamic type of context. Finally, interaction context is the state of the current session, like previous questions in a chat or the error message from the last failed test.
Once you know what you need, you have to get it to the model. There are a few patterns, each with different trade-offs in performance and complexity. The simplest method is direct parameter passing, where you include short-lived context like a `session_id` directly in the API call. For more persistent information, a context store like Redis can store user profiles or permission sets, which your application retrieves before calling the model. To fetch information from large volumes of text like documentation or the codebase, Retrieval-Augmented Generation (RAG) can find the most relevant chunks to include. Some context can even be inferred from system events, like automatically injecting information about a high-latency microservice.
The main trade-off is between completeness and performance. A full RAG query can add 500ms of latency to a request. A larger context increases token cost and can introduce noise that worsens model performance. The goal is to provide the minimum sufficient context, not the maximum possible.
Outdated context is as bad as no context. A model that thinks a deprecated function is still in use will generate code that is already broken. Just like your API schemas, the structure of the context you provide also changes, so you should version your context objects. You need to monitor how stale the information is and define expiration limits. This should not be manual. Connect this to your CI/CD pipelines. When documentation is updated, a post-commit hook should trigger reindexing. When a new microservice is deployed, its OpenAPI spec should be automatically published to your context store.
Putting it into practice
Building context-aware systems means designing services where context assembly is a primary responsibility. Instead of each feature calling the LLM directly, you can have a central service that gathers information from different sources and provides it to other services.
This also means designing for failure. What happens when a context source is not available? The system should degrade in a controlled way. A code generation tool that cannot access the full codebase might refuse to perform a complex refactor and offer a simpler, safer alternative. It should signal that the response is based on incomplete information.
You need to measure the impact. Track how often the model’s output is accepted without changes versus when it requires manual edits. Instrument your systems to log when the lack of a specific type of context leads to an error. This data will show where context matters most and where it is worth investing in improving your retrieval pipelines.
Teams that treat AI as a prompt engineering problem quickly get stuck in a loop of tweaking instructions and manually fixing plausible but wrong outputs. They hit a reliability ceiling.
Teams that treat AI as a systems integration problem, a context problem, will build the infrastructure to provide models with the information they need to be actually useful. They will unlock a higher level of performance and reliability.