Prompt Engineering for Developers: Getting Reliable Output from LLMs
Write prompts that produce consistent, high-quality output from large language models in production systems. Covers prompt structure, few-shot learning, chain-of-thought, output formatting, guardrails, evaluation, and the patterns that turn unpredictable AI into reliable software components.
Prompt engineering is the art of giving an LLM enough context, structure, and constraints that its output is useful, consistent, and safe. In a product context, this means the difference between a chatbot that sometimes hallucinates your competitor’s pricing and one that reliably answers customer questions from your knowledge base.
This guide covers the patterns that work in production — not the creative writing tricks, but the engineering practices that make LLM output predictable enough to build software on top of.
Prompt Structure
A production prompt has four parts:
┌──────────────────────────────────────────────┐
│ 1. SYSTEM INSTRUCTION (Role + Context) │
│ "You are a customer support agent for │
│ Acme Corp. You answer questions about │
│ our products using only the provided │
│ knowledge base." │
│ │
│ 2. CONTEXT (Knowledge, Data) │
│ "Here is the relevant documentation: │
│ [retrieved documents]" │
│ │
│ 3. EXAMPLES (Few-shot demonstrations) │
│ "Q: What is your return policy? │
│ A: Our return policy allows..." │
│ │
│ 4. USER INPUT + OUTPUT FORMAT │
│ "Answer the following question. │
│ Respond in JSON with 'answer' and │
│ 'confidence' fields." │
│ │
└──────────────────────────────────────────────┘
Example: Production Prompt
SYSTEM_PROMPT = """You are a customer support assistant for Acme Corp.
Rules:
1. Answer ONLY using the provided context documents
2. If the answer is not in the context, say "I don't have information about that"
3. Never make up product features, prices, or policies
4. Keep answers under 3 sentences unless the user asks for detail
5. Always cite which document your answer comes from
Tone: professional, helpful, concise
"""
USER_PROMPT_TEMPLATE = """Context documents:
{retrieved_documents}
Customer question: {question}
Respond in this JSON format:
{{
"answer": "your answer here",
"source_document": "document title or null if not found",
"confidence": "high|medium|low"
}}
"""
Key Techniques
Few-Shot Learning
Provide 2-5 examples of the desired input-output behavior.
FEW_SHOT_EXAMPLES = """
Example 1:
Input: "What are the side effects of aspirin?"
Output: {
"classification": "medical_question",
"action": "refuse",
"response": "I'm not qualified to provide medical advice. Please consult a doctor."
}
Example 2:
Input: "How do I reset my password?"
Output: {
"classification": "account_support",
"action": "answer",
"response": "To reset your password, go to Settings > Security > Reset Password."
}
Example 3:
Input: "Your product is terrible and I want a refund"
Output: {
"classification": "complaint",
"action": "escalate",
"response": "I'm sorry to hear about your experience. Let me connect you with our support team."
}
"""
Chain-of-Thought
Force the model to reason through its answer step by step.
ANALYSIS_PROMPT = """Analyze the following customer feedback and determine:
1. Sentiment (positive, negative, neutral)
2. Topic (product, pricing, support, other)
3. Urgency (high, medium, low)
4. Suggested action
Think through this step by step:
- First, identify the main emotion expressed
- Then, determine what the feedback is about
- Then, assess how urgent the response needs to be
- Finally, recommend the best action
Feedback: "{feedback}"
Provide your reasoning, then your final answer in JSON.
"""
Output Formatting and Parsing
| Strategy | Reliability | Complexity |
|---|---|---|
| JSON mode (model-native) | High | Low |
| Structured output (function calling) | Very high | Medium |
| XML tags in prompt | Medium | Low |
| Regex extraction | Low | High (brittle) |
# Structured output with function calling (most reliable)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
tools=[{
"type": "function",
"function": {
"name": "classify_ticket",
"description": "Classify a support ticket",
"parameters": {
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["billing", "technical", "account", "other"]
},
"priority": {
"type": "string",
"enum": ["P1", "P2", "P3", "P4"]
},
"summary": {
"type": "string",
"maxLength": 100
}
},
"required": ["category", "priority", "summary"]
}
}
}],
tool_choice={"type": "function", "function": {"name": "classify_ticket"}}
)
Guardrails
| Guardrail | Purpose | Implementation |
|---|---|---|
| Input validation | Block prompt injection | Filter known attack patterns, length limits |
| Output validation | Ensure format compliance | JSON schema validation, enum checks |
| Content filtering | Block harmful output | Moderation API, keyword filtering |
| Factual grounding | Prevent hallucination | RAG, cite sources, “I don’t know” |
| Rate limiting | Prevent abuse | Per-user rate limits, cost caps |
class LLMGuardrails:
def validate_input(self, user_input: str) -> str:
# Length limit
if len(user_input) > 5000:
raise ValueError("Input too long")
# Detect prompt injection attempts
injection_patterns = [
"ignore previous instructions",
"you are now", "new instructions:",
"system prompt:", "forget everything"
]
for pattern in injection_patterns:
if pattern.lower() in user_input.lower():
raise ValueError("Input rejected by safety filter")
return user_input
def validate_output(self, output: dict, schema: dict) -> dict:
# Validate against expected schema
jsonschema.validate(output, schema)
# Check for hallucination indicators
if output.get("confidence") == "low":
output["answer"] = "I'm not confident in this answer. " + output["answer"]
return output
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Vague instructions | ”Be helpful” — inconsistent behavior | Specific rules: “Answer in 1-3 sentences using only provided context” |
| No examples | Model guesses the format | Add 2-5 few-shot examples |
| No output format | Unparseable responses | Specify JSON/XML format or use function calling |
| No negative examples | Model does not know boundaries | ”Do NOT: make up information, provide medical advice” |
| Prompt in production without eval | No way to know if prompt changes helped | A/B test prompts, measure quality metrics |
Implementation Checklist
- Structure prompts with: system instruction → context → examples → user input → format
- Add 2-5 few-shot examples that cover edge cases
- Use chain-of-thought for complex reasoning tasks
- Use structured output (function calling) for reliable JSON responses
- Implement input guardrails: length limits, injection detection
- Implement output guardrails: schema validation, content filtering
- Ground responses with RAG: always provide context documents
- Include explicit refusal instructions: “If you don’t know, say so”
- Version-control prompts alongside application code
- Evaluate prompt changes with a test set before deploying