Ansible for Infrastructure Automation: Playbooks That Do Not Break at 3 AM
Write Ansible playbooks that are idempotent, testable, and maintainable across hundreds of servers. Covers inventory management, role design, vault secrets, testing with Molecule, error handling, and the patterns that prevent configuration drift.
Ansible fills the gap between “I need to configure 200 servers” and “I do not want to manage a centralized agent on 200 servers.” It is agentless, uses SSH, and speaks YAML — which makes it approachable for anyone who can write a shell script but dangerous for anyone who writes Ansible like a shell script.
The difference between good Ansible and bad Ansible is idempotency: good playbooks can run 100 times and produce the same result. Bad playbooks work once, break on the second run, and leave servers in an inconsistent state.
Playbook Structure
project/
├── ansible.cfg # Ansible configuration
├── inventory/
│ ├── production/
│ │ ├── hosts.yml # Production servers
│ │ └── group_vars/
│ │ ├── all.yml # Variables for all production hosts
│ │ ├── webservers.yml
│ │ └── databases.yml
│ └── staging/
│ ├── hosts.yml
│ └── group_vars/
│ └── all.yml
├── playbooks/
│ ├── site.yml # Main playbook (includes roles)
│ ├── deploy.yml # Application deployment
│ └── security.yml # Security hardening
├── roles/
│ ├── common/ # Base configuration for all servers
│ ├── nginx/ # Web server setup
│ ├── postgresql/ # Database setup
│ └── app/ # Application deployment
└── requirements.yml # External role dependencies
Inventory
# inventory/production/hosts.yml
all:
children:
webservers:
hosts:
web-1.example.com:
web-2.example.com:
web-3.example.com:
vars:
nginx_worker_processes: auto
nginx_worker_connections: 4096
databases:
hosts:
db-primary.example.com:
postgresql_role: primary
db-replica.example.com:
postgresql_role: replica
vars:
postgresql_version: 16
postgresql_max_connections: 200
Role Design
Role Structure
roles/nginx/
├── defaults/
│ └── main.yml # Default variable values (lowest priority)
├── vars/
│ └── main.yml # Role variables (high priority)
├── tasks/
│ └── main.yml # Task definitions
├── handlers/
│ └── main.yml # Event-triggered actions (restart services)
├── templates/
│ └── nginx.conf.j2 # Jinja2 templates
├── files/
│ └── ssl/ # Static files to copy
├── meta/
│ └── main.yml # Role metadata and dependencies
└── molecule/
└── default/ # Test scenarios
Idempotent Tasks
# roles/nginx/tasks/main.yml
# ✅ Idempotent: package module checks if already installed
- name: Install nginx
ansible.builtin.package:
name: nginx
state: present
# ✅ Idempotent: template only changes if content differs
- name: Configure nginx
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
validate: nginx -t -c %s # Validate before deploying
notify: Reload nginx # Only triggers if changed
# ✅ Idempotent: service module ensures desired state
- name: Ensure nginx is running
ansible.builtin.service:
name: nginx
state: started
enabled: true
# ❌ NOT idempotent: shell runs every time
- name: Configure something
ansible.builtin.shell: "echo 'config=true' >> /etc/app/config"
# This APPENDS every run, creating duplicates!
# ✅ Fixed: use lineinfile for idempotent file editing
- name: Configure something
ansible.builtin.lineinfile:
path: /etc/app/config
line: "config=true"
state: present
Secrets Management with Vault
# Encrypt a file
ansible-vault encrypt vars/secrets.yml
# Edit encrypted file
ansible-vault edit vars/secrets.yml
# Run playbook with vault password
ansible-playbook site.yml --ask-vault-pass
# Or use a password file (for CI/CD)
ansible-playbook site.yml --vault-password-file ~/.vault_pass
# vars/secrets.yml (encrypted at rest)
database_password: "{{ vault_database_password }}"
api_key: "{{ vault_api_key }}"
ssl_private_key: "{{ vault_ssl_private_key }}"
# Reference in tasks
- name: Configure database connection
ansible.builtin.template:
src: database.yml.j2
dest: /etc/app/database.yml
mode: '0600' # Restrict permissions on files with secrets
Error Handling
# Block/rescue/always for error handling
- name: Deploy application
block:
- name: Pull latest code
ansible.builtin.git:
repo: "{{ app_repo }}"
dest: "{{ app_dir }}"
version: "{{ app_version }}"
- name: Install dependencies
ansible.builtin.pip:
requirements: "{{ app_dir }}/requirements.txt"
virtualenv: "{{ venv_dir }}"
- name: Run database migrations
ansible.builtin.command:
cmd: "{{ venv_dir }}/bin/python manage.py migrate"
chdir: "{{ app_dir }}"
rescue:
- name: Rollback to previous version
ansible.builtin.git:
repo: "{{ app_repo }}"
dest: "{{ app_dir }}"
version: "{{ previous_version }}"
- name: Notify team of failed deployment
ansible.builtin.uri:
url: "{{ slack_webhook }}"
method: POST
body_format: json
body:
text: "⚠️ Deployment of {{ app_version }} failed on {{ inventory_hostname }}. Rolled back to {{ previous_version }}."
always:
- name: Ensure application is running
ansible.builtin.service:
name: "{{ app_service }}"
state: started
Testing with Molecule
# molecule/default/molecule.yml
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: instance
image: ubuntu:22.04
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible
# molecule/default/verify.yml
- name: Verify nginx role
hosts: all
tasks:
- name: Check nginx is installed
ansible.builtin.package_facts:
register: packages
- name: Assert nginx is installed
ansible.builtin.assert:
that: "'nginx' in ansible_facts.packages"
- name: Check nginx is running
ansible.builtin.service_facts:
- name: Assert nginx is running
ansible.builtin.assert:
that: "ansible_facts.services['nginx.service'].state == 'running'"
- name: Check nginx responds
ansible.builtin.uri:
url: http://localhost:80
status_code: 200
Implementation Checklist
- Organize playbooks with inventory per environment (staging, production)
- Use roles for reusable configuration (not monolithic playbooks)
- Write idempotent tasks: use modules (template, lineinfile, service) instead of shell
- Encrypt all secrets with Ansible Vault — never plaintext in Git
- Add validate directives to configuration templates (nginx -t, apache configtest)
- Use handlers for service restarts — only restart when configuration changes
- Implement block/rescue for deployment rollback on failure
- Test roles with Molecule before deploying to production
- Tag tasks for selective execution (—tags deploy, —tags security)
- Run playbooks in check mode (—check) before applying to production