Prompt Engineering for Developers: Getting Reliable Output from LLMs

Prompt engineering is the art of giving an LLM enough context, structure, and constraints that its output is useful, consistent, and safe. In a product context, this means the difference between a chatbot that sometimes hallucinates your competitor’s pricing and one that reliably answers customer questions from your knowledge base.

This guide covers the patterns that work in production — not the creative writing tricks, but the engineering practices that make LLM output predictable enough to build software on top of.

Prompt Structure

A production prompt has four parts:

┌──────────────────────────────────────────────┐
│  1. SYSTEM INSTRUCTION (Role + Context)       │
│     "You are a customer support agent for     │
│      Acme Corp. You answer questions about    │
│      our products using only the provided     │
│      knowledge base."                         │
│                                                │
│  2. CONTEXT (Knowledge, Data)                 │
│     "Here is the relevant documentation:      │
│      [retrieved documents]"                   │
│                                                │
│  3. EXAMPLES (Few-shot demonstrations)        │
│     "Q: What is your return policy?           │
│      A: Our return policy allows..."          │
│                                                │
│  4. USER INPUT + OUTPUT FORMAT                │
│     "Answer the following question.           │
│      Respond in JSON with 'answer' and        │
│      'confidence' fields."                    │
│                                                │
└──────────────────────────────────────────────┘

Example: Production Prompt

SYSTEM_PROMPT = """You are a customer support assistant for Acme Corp.

Rules:
1. Answer ONLY using the provided context documents
2. If the answer is not in the context, say "I don't have information about that"
3. Never make up product features, prices, or policies
4. Keep answers under 3 sentences unless the user asks for detail
5. Always cite which document your answer comes from

Tone: professional, helpful, concise
"""

USER_PROMPT_TEMPLATE = """Context documents:
{retrieved_documents}

Customer question: {question}

Respond in this JSON format:
{{
  "answer": "your answer here",
  "source_document": "document title or null if not found",
  "confidence": "high|medium|low"
}}
"""

Key Techniques

Few-Shot Learning

Provide 2-5 examples of the desired input-output behavior.

FEW_SHOT_EXAMPLES = """
Example 1:
Input: "What are the side effects of aspirin?"
Output: {
  "classification": "medical_question",
  "action": "refuse",
  "response": "I'm not qualified to provide medical advice. Please consult a doctor."
}

Example 2:
Input: "How do I reset my password?"
Output: {
  "classification": "account_support",
  "action": "answer",
  "response": "To reset your password, go to Settings > Security > Reset Password."
}

Example 3:
Input: "Your product is terrible and I want a refund"
Output: {
  "classification": "complaint",
  "action": "escalate",
  "response": "I'm sorry to hear about your experience. Let me connect you with our support team."
}
"""

Chain-of-Thought

Force the model to reason through its answer step by step.

ANALYSIS_PROMPT = """Analyze the following customer feedback and determine:
1. Sentiment (positive, negative, neutral)
2. Topic (product, pricing, support, other)
3. Urgency (high, medium, low)
4. Suggested action

Think through this step by step:
- First, identify the main emotion expressed
- Then, determine what the feedback is about
- Then, assess how urgent the response needs to be
- Finally, recommend the best action

Feedback: "{feedback}"

Provide your reasoning, then your final answer in JSON.
"""

Output Formatting and Parsing

Strategy	Reliability	Complexity
JSON mode (model-native)	High	Low
Structured output (function calling)	Very high	Medium
XML tags in prompt	Medium	Low
Regex extraction	Low	High (brittle)

# Structured output with function calling (most reliable)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    tools=[{
        "type": "function",
        "function": {
            "name": "classify_ticket",
            "description": "Classify a support ticket",
            "parameters": {
                "type": "object",
                "properties": {
                    "category": {
                        "type": "string",
                        "enum": ["billing", "technical", "account", "other"]
                    },
                    "priority": {
                        "type": "string",
                        "enum": ["P1", "P2", "P3", "P4"]
                    },
                    "summary": {
                        "type": "string",
                        "maxLength": 100
                    }
                },
                "required": ["category", "priority", "summary"]
            }
        }
    }],
    tool_choice={"type": "function", "function": {"name": "classify_ticket"}}
)

Guardrails

Guardrail	Purpose	Implementation
Input validation	Block prompt injection	Filter known attack patterns, length limits
Output validation	Ensure format compliance	JSON schema validation, enum checks
Content filtering	Block harmful output	Moderation API, keyword filtering
Factual grounding	Prevent hallucination	RAG, cite sources, “I don’t know”
Rate limiting	Prevent abuse	Per-user rate limits, cost caps

class LLMGuardrails:
    def validate_input(self, user_input: str) -> str:
        # Length limit
        if len(user_input) > 5000:
            raise ValueError("Input too long")

        # Detect prompt injection attempts
        injection_patterns = [
            "ignore previous instructions",
            "you are now", "new instructions:",
            "system prompt:", "forget everything"
        ]
        for pattern in injection_patterns:
            if pattern.lower() in user_input.lower():
                raise ValueError("Input rejected by safety filter")

        return user_input

    def validate_output(self, output: dict, schema: dict) -> dict:
        # Validate against expected schema
        jsonschema.validate(output, schema)

        # Check for hallucination indicators
        if output.get("confidence") == "low":
            output["answer"] = "I'm not confident in this answer. " + output["answer"]

        return output

Anti-Patterns

Anti-Pattern	Problem	Fix
Vague instructions	”Be helpful” — inconsistent behavior	Specific rules: “Answer in 1-3 sentences using only provided context”
No examples	Model guesses the format	Add 2-5 few-shot examples
No output format	Unparseable responses	Specify JSON/XML format or use function calling
No negative examples	Model does not know boundaries	”Do NOT: make up information, provide medical advice”
Prompt in production without eval	No way to know if prompt changes helped	A/B test prompts, measure quality metrics

Prompt Structure

Example: Production Prompt

Key Techniques

Few-Shot Learning

Chain-of-Thought

Output Formatting and Parsing

Guardrails

Anti-Patterns

Implementation Checklist

More in AI Engineering

AI Agent Orchestration

AI Agent Tool Selection Optimization

AI Agent Architecture