This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience.
Running actions on a server is nice, but a network round trip for each action is not very efficient. If I need to run a linear sequence of actions, I can stream them all to the server, and then read replies streamed from the server as they get executed.
This technique is called pipelining and one can see it used, for example, in Redis, or Mitogen.
Roles
Ansible has the concept of "Roles" as a series of related tasks: I'll play with that. Here's an example role to install and setup fail2ban:
class Role(role.Role):
def main(self):
self.add(builtin.apt(
name=["fail2ban"],
state="present",
))
self.add(builtin.copy(
content=inline("""
[postfix]
enabled = true
[dovecot]
enabled = true
"""),
dest="/etc/fail2ban/jail.local",
owner="root",
group="root",
mode=0o644,
), name="configure fail2ban")
I prototyped roles as classes, with methods that push actions down the pipeline. If an action fails, all further actions for the same role won't executed, and will be marked as skipped.
Since skipping is applied per-role, it means that I can blissfully stream actions for multiple roles to the server down the same pipe, and errors in one role will stop executing that role and not others. Potentially I can get multiple roles going with a single network round-trip:
#!/usr/bin/python3
import sys
from transilience.system import Mitogen
from transilience.runner import Runner
@Runner.cli
def main():
system = Mitogen("my server", "ssh", hostname="server.example.org", username="root")
runner = Runner(system)
# Send roles to the server
runner.add_role("general")
runner.add_role("fail2ban")
runner.add_role("prosody")
# Run until all roles are done
runner.main()
if __name__ == "__main__":
sys.exit(main())
That looks like a playbook, using Python as glue rather than YAML.
Decision making in roles
Besides filing a series of actions, a role may need to take decisions based on the results of previous actions, or on facts discovered from the server. In that case, we need to wait until the results we need come back from the server, and then decide if we're done or if we want to send more actions down the pipe.
Here's an example role that installs and configures Prosody:
from transilience import actions, role
from transilience.actions import builtin
from .handlers import RestartProsody
class Role(role.Role):
"""
Set up prosody XMPP server
"""
def main(self):
self.add(actions.facts.Platform(), then=self.have_facts)
self.add(builtin.apt(
name=["certbot", "python-certbot-apache"],
state="present",
), name="install support packages")
self.add(builtin.apt(
name=["prosody", "prosody-modules", "lua-sec", "lua-event", "lua-dbi-sqlite3"],
state="present",
), name="install prosody packages")
def have_facts(self, facts):
facts = facts.facts # Malkovich Malkovich Malkovich!
domain = facts["domain"]
ctx = {
"ansible_domain": domain
}
self.add(builtin.command(
argv=["certbot", "certonly", "-d", f"chat.{domain}", "-n", "--apache"],
creates=f"/etc/letsencrypt/live/chat.{domain}/fullchain.pem"
), name="obtain chat certificate")
with self.notify(RestartProsody):
self.add(builtin.copy(
content=self.template_engine.render_file("roles/prosody/templates/prosody.cfg.lua", ctx),
dest="/etc/prosody/prosody.cfg.lua",
), name="write prosody configuration")
self.add(builtin.copy(
src="roles/prosody/templates/firewall-ruleset.pfw",
dest="/etc/prosody/firewall-ruleset.pfw",
), name="write prosody firewall")
# ...
This files some general actions down the pipe, with a hook that says: when the
results of this action come back, run self.have_facts()
.
At that point, the role can use the results to build certbot
command lines,
render prosody's configuration from Jinja2 templates, and use the results to
file further action down the pipe.
Note that this way, while the server is potentially still busy installing prosody, we're already streaming prosody's configuration to it.
If anything goes wrong with the installation of prosody's package, the role
will be marked as failed and all further actions of the same role, even those
filed by have_facts()
will be skipped.
Notify and handlers
In the previous example self.notify()
also appears: that's my attempt to
model the equivalent of Ansible's handlers. If any of the actions inside the
with
produce changes, then the RestartProsody
role will be executed,
potentially filing more actions ad the end of the playbook.
The runner will take care of collecting all the triggered role classes in a
set,
which discards duplicates, and then running
the main()
method of all resulting roles, which will cause more actions to be
filed down the pipe.
Action conditions
Sometimes some actions are only meaningful as consequences of other actions.
Let's take, for example, enabling buster-backports
as an extra apt source:
a = self.add(builtin.copy(
owner="root",
group="root",
mode=0o644,
dest="/etc/apt/sources.list.d/debian-buster-backports.list",
content="deb [arch=amd64] https://mirrors.gandi.net/debian/ buster-backports main contrib",
), name="enable backports")
self.add(builtin.apt(
update_cache=True
), name="update after enabling backports",
# Run only if the previous copy changed anything
when={a: ResultState.CHANGED},
)
Here we want to update Apt's cache, which is a slow operation, only after we
actually write /etc/apt/sources.list.d/debian-buster-backports.list
. If the
file was already there from a previous run, we can skip downloading the new
package lists.
The when=
attributes adds an annotation to the action that is sent town the
pipeline, that says that it should only be run if the state of a previous
action matches the given one.
In this case, when on the remote it's the turn of "update after enabling
backports", it gets skipped unless the state of the previous "enable backports"
action is CHANGED
.
Effects of pipelining
I ported enough of Ansible's modules to be able to run the provisioning scripts of my VPS entirely via ansible.
This is the playbook run as plain Ansible:
$ time ansible-playbook vps.yaml
[...]
servername : ok=55 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
real 2m10.072s
user 0m33.149s
sys 0m10.379s
This is the same playbook run with Ansible speeded up via the Mitogen backend, which makes Ansible more bearable:
$ export ANSIBLE_STRATEGY=mitogen_linear
$ time ansible-playbook vps.yaml
[...]
servername : ok=55 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
real 0m24.428s
user 0m8.479s
sys 0m1.894s
This is the same playbook ported to Transilience:
$ time ./provision
[...]
real 0m2.585s
user 0m0.659s
sys 0m0.034s
Doing nothing went from 2 minutes down to 3 seconds!
That's the kind of running time that finally makes me comfortable with maintaining my VPS by editing the playbook only, and never logging in to mess with the system configuration by hand!
Next steps
I'm quite happy with what I have: I can now maintain my VPS with a simple script with quick iterative cycles.
I might use it to develop new playbooks, and port them to ansible only when they're tested and need to be shared with infrastructure that needs to rely on something more solid and battle tested than a prototype provisioning system.
I might also keep working on it as I have more interesting ideas that I'd like to try. I feel like Ansible reached some architectural limits that are hard to overcome without a major redesign, and are in many way hardcoded in its playbook configuration. It's nice to be able to try out new designs without that baggage.
I'd love it if even just the library of Transilience actions could grow, and gain widespread use. Ansible modules standardized a set of management operations, that I think became the way people think about system management, and should really be broadly available outside of Ansible.
If you are interesting in playing with Transilience, such as:
- polishing the packaging, adding a
setup.py
, publishing to PIP, packaging in Debian - adding example playbooks
- porting more Ansible modules to Transilience actions
- improving the command line interface
- test other ways to feed actions to pipelines
- test other pipeline primitives
- add backends besides Local and Mitogen
- prototype a parser to turn a subsets of YAML playbook syntax into transilience actions
- adopt it into your multinational organization infrastructure to speed up provisioning times by orders of magnitude at the cost of the development time that it takes to turn this prototype into something solid and road tested
- create a startup and get millions in venture capital to disrupt the provisioning ecosystem
do get in touch or send a pull request! :)
Next step: Reimagining Ansible variables.