ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Ansible for Infrastructure Automation: Playbooks That Do Not Break at 3 AM

Write Ansible playbooks that are idempotent, testable, and maintainable across hundreds of servers. Covers inventory management, role design, vault secrets, testing with Molecule, error handling, and the patterns that prevent configuration drift.

Ansible fills the gap between “I need to configure 200 servers” and “I do not want to manage a centralized agent on 200 servers.” It is agentless, uses SSH, and speaks YAML — which makes it approachable for anyone who can write a shell script but dangerous for anyone who writes Ansible like a shell script.

The difference between good Ansible and bad Ansible is idempotency: good playbooks can run 100 times and produce the same result. Bad playbooks work once, break on the second run, and leave servers in an inconsistent state.


Playbook Structure

project/
├── ansible.cfg              # Ansible configuration
├── inventory/
│   ├── production/
│   │   ├── hosts.yml        # Production servers
│   │   └── group_vars/
│   │       ├── all.yml      # Variables for all production hosts
│   │       ├── webservers.yml
│   │       └── databases.yml
│   └── staging/
│       ├── hosts.yml
│       └── group_vars/
│           └── all.yml
├── playbooks/
│   ├── site.yml             # Main playbook (includes roles)
│   ├── deploy.yml           # Application deployment
│   └── security.yml         # Security hardening
├── roles/
│   ├── common/              # Base configuration for all servers
│   ├── nginx/               # Web server setup
│   ├── postgresql/          # Database setup
│   └── app/                 # Application deployment
└── requirements.yml         # External role dependencies

Inventory

# inventory/production/hosts.yml
all:
  children:
    webservers:
      hosts:
        web-1.example.com:
        web-2.example.com:
        web-3.example.com:
      vars:
        nginx_worker_processes: auto
        nginx_worker_connections: 4096

    databases:
      hosts:
        db-primary.example.com:
          postgresql_role: primary
        db-replica.example.com:
          postgresql_role: replica
      vars:
        postgresql_version: 16
        postgresql_max_connections: 200

Role Design

Role Structure

roles/nginx/
├── defaults/
│   └── main.yml       # Default variable values (lowest priority)
├── vars/
│   └── main.yml       # Role variables (high priority)
├── tasks/
│   └── main.yml       # Task definitions
├── handlers/
│   └── main.yml       # Event-triggered actions (restart services)
├── templates/
│   └── nginx.conf.j2  # Jinja2 templates
├── files/
│   └── ssl/           # Static files to copy
├── meta/
│   └── main.yml       # Role metadata and dependencies
└── molecule/
    └── default/       # Test scenarios

Idempotent Tasks

# roles/nginx/tasks/main.yml

# ✅ Idempotent: package module checks if already installed
- name: Install nginx
  ansible.builtin.package:
    name: nginx
    state: present

# ✅ Idempotent: template only changes if content differs
- name: Configure nginx
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
    validate: nginx -t -c %s    # Validate before deploying
  notify: Reload nginx           # Only triggers if changed

# ✅ Idempotent: service module ensures desired state
- name: Ensure nginx is running
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: true

# ❌ NOT idempotent: shell runs every time
- name: Configure something
  ansible.builtin.shell: "echo 'config=true' >> /etc/app/config"
  # This APPENDS every run, creating duplicates!

# ✅ Fixed: use lineinfile for idempotent file editing
- name: Configure something
  ansible.builtin.lineinfile:
    path: /etc/app/config
    line: "config=true"
    state: present

Secrets Management with Vault

# Encrypt a file
ansible-vault encrypt vars/secrets.yml

# Edit encrypted file
ansible-vault edit vars/secrets.yml

# Run playbook with vault password
ansible-playbook site.yml --ask-vault-pass

# Or use a password file (for CI/CD)
ansible-playbook site.yml --vault-password-file ~/.vault_pass
# vars/secrets.yml (encrypted at rest)
database_password: "{{ vault_database_password }}"
api_key: "{{ vault_api_key }}"
ssl_private_key: "{{ vault_ssl_private_key }}"

# Reference in tasks
- name: Configure database connection
  ansible.builtin.template:
    src: database.yml.j2
    dest: /etc/app/database.yml
    mode: '0600'    # Restrict permissions on files with secrets

Error Handling

# Block/rescue/always for error handling
- name: Deploy application
  block:
    - name: Pull latest code
      ansible.builtin.git:
        repo: "{{ app_repo }}"
        dest: "{{ app_dir }}"
        version: "{{ app_version }}"

    - name: Install dependencies
      ansible.builtin.pip:
        requirements: "{{ app_dir }}/requirements.txt"
        virtualenv: "{{ venv_dir }}"

    - name: Run database migrations
      ansible.builtin.command:
        cmd: "{{ venv_dir }}/bin/python manage.py migrate"
        chdir: "{{ app_dir }}"

  rescue:
    - name: Rollback to previous version
      ansible.builtin.git:
        repo: "{{ app_repo }}"
        dest: "{{ app_dir }}"
        version: "{{ previous_version }}"

    - name: Notify team of failed deployment
      ansible.builtin.uri:
        url: "{{ slack_webhook }}"
        method: POST
        body_format: json
        body:
          text: "⚠️ Deployment of {{ app_version }} failed on {{ inventory_hostname }}. Rolled back to {{ previous_version }}."

  always:
    - name: Ensure application is running
      ansible.builtin.service:
        name: "{{ app_service }}"
        state: started

Testing with Molecule

# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: instance
    image: ubuntu:22.04
    pre_build_image: true
provisioner:
  name: ansible
verifier:
  name: ansible
# molecule/default/verify.yml
- name: Verify nginx role
  hosts: all
  tasks:
    - name: Check nginx is installed
      ansible.builtin.package_facts:
      register: packages

    - name: Assert nginx is installed
      ansible.builtin.assert:
        that: "'nginx' in ansible_facts.packages"

    - name: Check nginx is running
      ansible.builtin.service_facts:

    - name: Assert nginx is running
      ansible.builtin.assert:
        that: "ansible_facts.services['nginx.service'].state == 'running'"

    - name: Check nginx responds
      ansible.builtin.uri:
        url: http://localhost:80
        status_code: 200

Implementation Checklist

  • Organize playbooks with inventory per environment (staging, production)
  • Use roles for reusable configuration (not monolithic playbooks)
  • Write idempotent tasks: use modules (template, lineinfile, service) instead of shell
  • Encrypt all secrets with Ansible Vault — never plaintext in Git
  • Add validate directives to configuration templates (nginx -t, apache configtest)
  • Use handlers for service restarts — only restart when configuration changes
  • Implement block/rescue for deployment rollback on failure
  • Test roles with Molecule before deploying to production
  • Tag tasks for selective execution (—tags deploy, —tags security)
  • Run playbooks in check mode (—check) before applying to production
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →