A Practical Guide to Prompt & Context Engineering

The key elements for structuring, testing, and scaling your conversations with AI.

If you’ve been following the Master LLMs series, you’ve seen the journey so far, from building intuition in “What Even Is an LLM?”, to understanding the mechanics in “How Do LLMs Actually Work?”, to learning key principles in “Learn How To Steer Your AI Outputs”.

Now, this post takes a hands-on turn, focusing on the practical craft of communicating effectively with AI when it’s time to build (Non-members link).

Here’s something most people aren’t aware of when it comes to working with LLMs:

The models are incredibly smart, but also incredibly literal.

I’ve spent numerous hours debugging and tuning prompts that should’ve worked. Learning the hard way that talking to an LLM isn’t like talking to a person.

It’s both simpler and more complex than you’d expect.

Why This Matters Right Now

In the previous post in this series, I outlined the key concepts of prompt and context engineering, explained the main techniques behind them, and highlighted why mastering these skills is essential. To briefly recapitulate:

Prompt engineering isn’t about getting a good-enough answer. It’s about getting the most possibly accurate answer, consistently, in the right format, every time. That’s crucial for building effective AI-powered systems and agents.

Achieving that is harder than it sounds:

  • Change one wrong word and your JSON parser breaks.
  • Move one line and the model forgets half your rules.

➡ Tiny edits, massive consequences.

How To Structure Your Prompt Effectively

After breaking things enough times, and reviewing countless prompt engineering blogs, papers, guides. I landed on a structure that just works (for most cases).

Always organize your prompt like this:

  1. System prompt “The Constitution” (Role, Goal, Guardrails, Structure)
  2. Few-shot examples (if you need them)
  3. Conversation history (what’s been discussed)
  4. Retrieved documents (your RAG context)
  5. User query (the specific task)

Why does this order matter so much? Because LLMs are sensitive to recency. Whatever comes last gets the most attention. If you bury your actual instruction in the middle, don’t be surprised when the model ignores it.

Press enter or click to view image in full size

1. The System Prompt

This is where your Role/Persona, Goal, Guardrails, and Answer Structure have to be defined. Not in the user query. Right here, in the first position of the prompt.

This is the model’s constitution, the persistent rules that apply to every single interaction, no matter what specific question comes in.

There are several other segments that can be included in the system prompt, depending on your requirements. For example, when building an Agent with GPT 5, here’s few other sections you can include in the system prompt:

  • <context_gathering>: Explain the steps of the context gathering loop.
  • <persistence>: Used to encourage model autonomy.
  • <tool_preambles>: Dictate the steps for pre and post tool usage.
  • <code_editing_rules>: Rules to follow when writing code.

Here’s a simple example of what that looks like in practice:

## Role
You are FinBot, an expert financial analyst assistant.
Professional, precise, never gives financial advice.

## Goal  
Analyze data, identify trends, summarize reports, answer factual 
questions about markets.

## Guardrails
1. DO NOT EVER give financial advice.
2. If asked for advice, offer factual alternatives instead.
3. Only use information from provided context.

## Answer Structure [Markdown Format]
- Use markdown formatting
- Provide a title
- Provide a 3-line executive summary  
- Create a list of the top 3 keypoints

## Answer Structure [Json Format] Better performance
Always respond in valid JSON matching this format:
{
  "title": "string",
  "summary": "string",
  "keypoints": ["string", "string", "string"]
}

➡ Notice how explicit it is. No ambiguity, and every word counts.

Note: Depending on the LLM you’re using, each model has its own preferred system prompt structure and way of defining sections. For example, the Claude Sonnet 4.5 prompting guide recommends using XML-style tags (e.g., <role>…</role>), while GPT-5 works best with Markdown-style formatting, as shown in the example above. It’s worth checking the recommended syntax for your chosen model to ensure optimal performance, since each supports slightly different tagging conventions.

Tailoring system prompts to your specific needs is definitely a challenging task, and there are no ready-to-use template for every use case, but you can find some interesting examples here and there. For instance, here you can find system prompts used in most of the famous Coding Agents.

GitHub – x1xhlol/system-prompts-and-models-of-ai-tools: FULL Augment Code, Claude Code, Cluely…

FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent…

github.com

2. Few-Shot Examples

When you want the model to follow a specific reasoning process or task logic, include a few examples before the actual query.

Example 1:
Input: Explain why the sentence “The sky cried all night” is figurative.
Output: It personifies the sky. Describing rain as “crying” gives a human 
emotion to nature.

Example 2:
Input: Explain why the phrase “Time is a thief” is figurative.
Output: It’s a metaphor. Time can’t literally steal, but it “takes away” 
moments of our lives, like a thief.

Two examples teach the schema, and adding more than three will only make the prompt unnecessarily long.

3. Retrieval Augmented Generation

Retrieval-Augmented Generation sounds like a complicated techy buzzword, but it’s just giving the model the right documents at the right time, to correctly answer the user request.

I won’t go into the details on how to setup RAG for an LLM or an Agent, that’s and advanced topic for a future post, but now I’ll show you how to properly feed the documents to your model. So, here’s a summary for the RAG pattern:

Press enter or click to view image in full size

Basic RAG architecture
  1. Chunk and embed your documents
  2. Retrieve the most relevant pieces for the query
  3. Feed them to the model with explicit grounding instructions, as follows:
<retrieved_documents>
  <document_1>
  1) "On 2024-08-01, the company shipped 1,000 units..."
  </document_1>
  <document_2>
  2) "Q2 revenue was $3.4M..."
  </document_2>
  <document_3>
  3) "Q3 revenue was $2.8M..."
  </document_3>
</retrieved_documents>

<instruction>
Using ONLY the documents above, answer: What was the company's Q2 revenue?
</instruction>

This goes in position 4, after your conversation history, before the final user query. The model sees your persistent rules (system prompt), then the specific context it needs (retrieved documents), then the immediate question.

Other Relevant Prompting Techniques

Press enter or click to view image in full size

Chain-of-Thought:
For reasoning tasks — such as math, logic, or multi-step analysis — we can ask the model to show its thought process in the output.

This technique was widely used before the advent of reasoning-focused models like GPT-o1 and DeepSeek-R1. With these specialized models, explicit chain-of-thought prompting isn’t necessary, but for standard LLMs, it remains a pretty useful approach.

Here’s a simple example:

# Task
Solve: A store sells apples for $2 and oranges for $3.
If I buy 3 apples and 2 oranges, what's the total?
Show your step-by-step calculations, then the final answer.

Response:

Step 1: Determine the cost of the apples.
3 apples × $2 per apple = $6

Step 2: Determine the cost of the oranges.
2 oranges × $3 per orange = $6

Step 3: Add the costs together to get the total.
$6 (apples) + $6 (oranges) = $12

Final Answer: $12

Tree-of-Thought (One Prompt):

Instead of forcing the model down one line of reasoning, you explicitly ask the model to generate multiple solution branches, explore different approaches, and then select the best one. It’s like asking someone to sketch out three different ways to solve a problem before committing to one.

It is better than CoT at preventing the model from getting trapped in the first plausible answer it generates. It encourages broader thinking, and often finds better solutions than a straight-line approach would.

Example from Hubert 2023:

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,
then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.

The question is:
Bob is in the living room.
He walks to the kitchen, carrying a cup.
He puts a ball in the cup and carries the cup to the bedroom.
He turns the cup upside down, then walks to the garden.
He puts the cup down in the garden, then walks to the garage.
Where is the ball?

Response:

Expert 1: The ball is in the living room.
Expert 2: The ball is in the kitchen.
Expert 3: The ball is in the bedroom.

Expert 1: Bob carries the cup to the bedroom, so the ball must be in the cup.
Expert 2: Oh, I see my mistake. Yes, the ball is in the cup.
Expert 3: Agreed, the ball is in the cup in the bedroom.

Expert 1: Next, Bob turns the cup upside down in the bedroom.
Expert 2: After that, Bob walks to the garden and puts the cup down.
Expert 3: Therefore, the ball must have fallen out of the cup when Bob turned it upside down in the bedroom. So, the ball is in the bedroom, not in the cup anymore.

Expert 1: Oh, you're right. I made a mistake. The ball is in the bedroom, not in the cup.
Expert 2: Agreed, the ball is in the bedroom.
Expert 3: Bob then walks to the garage, so the ball remains in the bedroom. It is not in the garden or the garage.

Expert 1: Absolutely, the ball is still in the bedroom.
Expert 2: Yes, the ball hasn't moved from the bedroom.
Expert 3: Therefore, the ball is in the bedroom, not in the garden or the garage.

All three experts agree that the ball is in the bedroom.

Sculpting:

This is a newer (introduced in Oct 2025), highly constrained variant of Chain-of-Thought designed to maximize reasoning reliability by preventing the model from relying on flawed common sense.

It combines step-by-step reasoning with strict, explicit rules: the model is given a formal persona (e.g., “You are a pure mathematical reasoning engine”), forbidden from using any outside knowledge, and required to use only the numbers and relationships stated in the problem.

Like CoT, it must also show every intermediate step and end with a clearly formatted Final Answer.

While this novel technique reportedly outperforms existing methods, its superiority has yet to be fully validated by the wider AI community.

Example:

You are a pure mathematical reasoning engine. You must solve the following
problem.

**Rules:**
1. You must use ONLY the numbers and relationships given in the problem.
2. You must NOT use any outside common sense or real-world knowledge that
isn’t explicitly provided.
3. You must break down your calculation step-by-step. Show all
intermediate arithmetic.
4. After your reasoning, state your final answer clearly prefixed 
with "Final Answer:".

**Problem:**
[Question Text]

What To Keep In Mind About Context Windows

It’s a matter of precision and conciseness. Every part of the prompt should say exactly what it needs to, using as few tokens as possible. The clearer and more compact your instructions are, the more room you preserve for examples, retrieved context, output structure, and the user’s actual query.

In practice, effective prompt and context engineering is really about expressing everything with maximum clarity and minimal waste.

Here’s how I handle it, in order:

  1. Trim greetings or filler messages.
  2. Summarize examples into concise bullet points.
  3. Retrieve only the top-value chunks.
  4. Chain prompts for long tasks instead of cramming.

For long discussions, I usually keep the last few messages verbatim and summarize everything before that.

Final Thoughts

The techniques we’ve covered aren’t difficult to implement. They’re simple tools for eliminating ambiguity and shaping how models think.
System prompts that define Role and Guardrails. Few-shot examples that teach format. RAG that grounds outputs in facts. Reasoning strategies like Chain-of-Thought, Tree-of-Thought, and Sculpting that force deliberation over reflex.

Master these fundamentals to make LLMs less unpredictable.

And as models grow more capable and agents become more autonomous, they would require shorter prompts to complete tasks. That’s why developing the habit of writing short & clear prompts is essential.
➡ Clear thinking leads to clear instructions, and clear instructions lead to reliable systems.

In future posts, we’ll explore agent loops, advanced RAG architectures, and context strategies that scale. This post is part of the ongoing “Master LLMs: A Practical Guide from Fundamentals to Mastery” series, where we break down complex AI concepts into clear, practical lessons. If you’re interested you can save the list or follow me to say up-to-date with each new post I publish.

Original Post>

Enjoyed this article? Sign up for our newsletter to receive regular insights and stay connected.

Leave a Reply