Verified by Garnet Grid

Prompt Engineering for Developers: Getting Reliable Output from LLMs

Write prompts that produce consistent, high-quality output from large language models in production systems. Covers prompt structure, few-shot learning, chain-of-thought, output formatting, guardrails, evaluation, and the patterns that turn unpredictable AI into reliable software components.

Prompt engineering is the art of giving an LLM enough context, structure, and constraints that its output is useful, consistent, and safe. In a product context, this means the difference between a chatbot that sometimes hallucinates your competitor’s pricing and one that reliably answers customer questions from your knowledge base.

This guide covers the patterns that work in production — not the creative writing tricks, but the engineering practices that make LLM output predictable enough to build software on top of.


Prompt Structure

A production prompt has four parts:

┌──────────────────────────────────────────────┐
│  1. SYSTEM INSTRUCTION (Role + Context)       │
│     "You are a customer support agent for     │
│      Acme Corp. You answer questions about    │
│      our products using only the provided     │
│      knowledge base."                         │
│                                                │
│  2. CONTEXT (Knowledge, Data)                 │
│     "Here is the relevant documentation:      │
│      [retrieved documents]"                   │
│                                                │
│  3. EXAMPLES (Few-shot demonstrations)        │
│     "Q: What is your return policy?           │
│      A: Our return policy allows..."          │
│                                                │
│  4. USER INPUT + OUTPUT FORMAT                │
│     "Answer the following question.           │
│      Respond in JSON with 'answer' and        │
│      'confidence' fields."                    │
│                                                │
└──────────────────────────────────────────────┘

Example: Production Prompt

SYSTEM_PROMPT = """You are a customer support assistant for Acme Corp.

Rules:
1. Answer ONLY using the provided context documents
2. If the answer is not in the context, say "I don't have information about that"
3. Never make up product features, prices, or policies
4. Keep answers under 3 sentences unless the user asks for detail
5. Always cite which document your answer comes from

Tone: professional, helpful, concise
"""

USER_PROMPT_TEMPLATE = """Context documents:
{retrieved_documents}

Customer question: {question}

Respond in this JSON format:
{{
  "answer": "your answer here",
  "source_document": "document title or null if not found",
  "confidence": "high|medium|low"
}}
"""

Key Techniques

Few-Shot Learning

Provide 2-5 examples of the desired input-output behavior.

FEW_SHOT_EXAMPLES = """
Example 1:
Input: "What are the side effects of aspirin?"
Output: {
  "classification": "medical_question",
  "action": "refuse",
  "response": "I'm not qualified to provide medical advice. Please consult a doctor."
}

Example 2:
Input: "How do I reset my password?"
Output: {
  "classification": "account_support",
  "action": "answer",
  "response": "To reset your password, go to Settings > Security > Reset Password."
}

Example 3:
Input: "Your product is terrible and I want a refund"
Output: {
  "classification": "complaint",
  "action": "escalate",
  "response": "I'm sorry to hear about your experience. Let me connect you with our support team."
}
"""

Chain-of-Thought

Force the model to reason through its answer step by step.

ANALYSIS_PROMPT = """Analyze the following customer feedback and determine:
1. Sentiment (positive, negative, neutral)
2. Topic (product, pricing, support, other)
3. Urgency (high, medium, low)
4. Suggested action

Think through this step by step:
- First, identify the main emotion expressed
- Then, determine what the feedback is about
- Then, assess how urgent the response needs to be
- Finally, recommend the best action

Feedback: "{feedback}"

Provide your reasoning, then your final answer in JSON.
"""

Output Formatting and Parsing

StrategyReliabilityComplexity
JSON mode (model-native)HighLow
Structured output (function calling)Very highMedium
XML tags in promptMediumLow
Regex extractionLowHigh (brittle)
# Structured output with function calling (most reliable)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    tools=[{
        "type": "function",
        "function": {
            "name": "classify_ticket",
            "description": "Classify a support ticket",
            "parameters": {
                "type": "object",
                "properties": {
                    "category": {
                        "type": "string",
                        "enum": ["billing", "technical", "account", "other"]
                    },
                    "priority": {
                        "type": "string",
                        "enum": ["P1", "P2", "P3", "P4"]
                    },
                    "summary": {
                        "type": "string",
                        "maxLength": 100
                    }
                },
                "required": ["category", "priority", "summary"]
            }
        }
    }],
    tool_choice={"type": "function", "function": {"name": "classify_ticket"}}
)

Guardrails

GuardrailPurposeImplementation
Input validationBlock prompt injectionFilter known attack patterns, length limits
Output validationEnsure format complianceJSON schema validation, enum checks
Content filteringBlock harmful outputModeration API, keyword filtering
Factual groundingPrevent hallucinationRAG, cite sources, “I don’t know”
Rate limitingPrevent abusePer-user rate limits, cost caps
class LLMGuardrails:
    def validate_input(self, user_input: str) -> str:
        # Length limit
        if len(user_input) > 5000:
            raise ValueError("Input too long")

        # Detect prompt injection attempts
        injection_patterns = [
            "ignore previous instructions",
            "you are now", "new instructions:",
            "system prompt:", "forget everything"
        ]
        for pattern in injection_patterns:
            if pattern.lower() in user_input.lower():
                raise ValueError("Input rejected by safety filter")

        return user_input

    def validate_output(self, output: dict, schema: dict) -> dict:
        # Validate against expected schema
        jsonschema.validate(output, schema)

        # Check for hallucination indicators
        if output.get("confidence") == "low":
            output["answer"] = "I'm not confident in this answer. " + output["answer"]

        return output

Anti-Patterns

Anti-PatternProblemFix
Vague instructions”Be helpful” — inconsistent behaviorSpecific rules: “Answer in 1-3 sentences using only provided context”
No examplesModel guesses the formatAdd 2-5 few-shot examples
No output formatUnparseable responsesSpecify JSON/XML format or use function calling
No negative examplesModel does not know boundaries”Do NOT: make up information, provide medical advice”
Prompt in production without evalNo way to know if prompt changes helpedA/B test prompts, measure quality metrics

Implementation Checklist

  • Structure prompts with: system instruction → context → examples → user input → format
  • Add 2-5 few-shot examples that cover edge cases
  • Use chain-of-thought for complex reasoning tasks
  • Use structured output (function calling) for reliable JSON responses
  • Implement input guardrails: length limits, injection detection
  • Implement output guardrails: schema validation, content filtering
  • Ground responses with RAG: always provide context documents
  • Include explicit refusal instructions: “If you don’t know, say so”
  • Version-control prompts alongside application code
  • Evaluate prompt changes with a test set before deploying
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →