A picture is worth a (few) thousand bytes

(Context alert: Know Chef. If you don’t, it’s seriously worth looking into for any level of infrastructure management.)

TL;DR: I wrote a Knife plugin to visualize Chef Role dependencies. It’s here.

Recently, I needed to sort out a large amount of roles and their dependencies, in order to simplify the lives of everyone using them.

It wasn’t easy to determine that changing one would affect many others, since it had become common practice to embed roles within other roles’ run_list, resulting in a tree of cross-dependency hell.
A node’s run_list would typically contain a single role-specific item, embedding the lower-level dependencies.

A sample may look like this:

node[web1] => run_list = role[webserver] => run_list = role[base], recipe[apache2], ...
node[db1] =>  run_list = role[database]  => run_list = role[base], recipe[mongodb], ...

Many of these roles had a fair amount of code duplication, and most were setting the same base role, as well as any role-specific recipes. Others were referencing the same recipes, so figuring out what to refactor and where, without breaking everything else, was more than challenging.

The approach I wanted to implement was to have a very generalized base role, apply that to every instance, then add any specific roles should be applied as well to a given node.

After refactoring node’s run list would typically look like:

node[web1] => run_list = role[base], role[webserver]
node[db1] =>  run_list = role[base], role[database]

A bit simpler, right?

This removes the embedded dependency on role[base], since the assumption is that every node with have role[base] applied to it, unless I don’t want to for some reason (some development environment for instance).

Trying to refactor this was pretty tricky, so I wrote a visualizer to collect all the roles from a Chef repository’s role_path, parse them out, and create an image.

I’ve used Graphviz for a number of years now, and it’s pretty general-purpose when it comes to creating graphs of things (nodes), connecting them (edges), and rendering an output. So this was my go-to for this project.

Selling you on the power of visualizing data is beyond the scope of this post (and probably the author), but suffice to say there’s industries built around putting data into visual format for a variety of reasons, such as relative comparison, trending, etc.
In fact some buddies of mine have built an awesome product that does just that – visualizes data and events over time. Check them out at Datadog. (I’ve written other stuff for their platform before, it’s totally awesome.)

In my case, I wanted the story told by the image to:

  1. Demonstrate the complexity of the connections between roles/recipes (aka spaghetti)
  2. Point out if I have any cyclic dependencies (it’s possible!)
  3. Let me focus on what to do next: untangle

Items 1 & 2 were pretty cool – my plugin spat out an increasingly complex graph, showing relationships that made sense for things to work, but also contained some items with 5-6 levels of inheritance that are easily muddled. I didn’t have any cyclic dependencies, so I created a sample one to see what it would look like. It looked like a circle.

Item 3 was harder, as this meant that human intervention needed to take place. It was almost like deciding on which area of a StarCraft map you want to go after first. There’s plenty of mining to do, but which will pay off fastest? (geeky references, are you surprised?)

I decided on some of the smaller clusterings, and made some progress, changing where certain role statements lived and the node <=> role assignment to refactor a lot out.

My process of writing a plugin developed pretty much like this:

  1. Have an idea of how I want to do this
  2. Write some code that when executed manually, does what I want
  3. Transform that code into a knife plugin, so it lives inside the Chef Ecosystem
  4. Package said plugin as RubyGem, to make distribution easy for others
  5. Test, test, test (more on this in a moment)
  6. Document (readme only for now)
  7. Add some features, rethink of how certain things are done, refactor.
  8. Test some more

Writing code, packaging and documentation are pretty standard practices (more or less), so I won’t go into those.

The more interesting part was figuring out how to plug into the Chef/Knife plugins architecture, and testing.

Thanks to Opscode, writing a plugin isn’t too hard, there’s a good wiki, and other plugins you can look at to get some ideas.

A couple of noteworthy items:

  1. Figuring out how to provide command-line arguments to OptionParser was not easy, since there was no real intuitive way to do it. I spent about 2 hours researching why that wasn’t doing what I wanted, and finally figured out that "--flag" and "--flag " behave completely different.

  2. During my initial cut of the code, I used many statements to print output back to the user (puts "some message"). In the knife plugin world, one should use the ui.info or ui.error and the like, as this makes it much cleaner and consistent with other knife commands.

Testing:

Since this is a command-line application plugin, it made sense to use a framework that can handle inputs and outputs, as that’s my primary concern.
With a background in systems administration and engineering, software testing has never been on the top of my to-learn list, so when the opportunity arose to write tests for another project I wrote, I turned to Cucumber, and the CLI extension Aruba.

Say what you will about unit tests vs integration tests vs functional tests – I got going relatively quickly writing tests in quasi-English.
I won’t say that it’s easy, but it definitely made me think about how the plugin will be used, how users may input commands differently, and what they can expect to happen when they run it.

Cucumber/Aruba also allowed me to split my tests in a way that I can grok, such as all the CLI-related commands, flags, options exist in one test ‘feature’ file, whereas another feature file contains all the tests of reading the roles and graphing them in different formats.

Writing tests early on allowed me to continue to capture how I thought the plugin will be used, write that down in English, and think about it for awhile.
Some things changed after I had written them down, and even then, after I figured out the tests, I decided that the behavior didn’t match what I thought would be most common.

Refactoring the code, running tests in between to ensure that the behavior that I wanted remained consistent was very valuable. This isn’t news for any software engineers out there, but it might be useful to more system people to learn more about testing.

Another test I use is a style-checker called tailor – it measures up my code, and reports on things that may be malformed. This is the first test I run, as if the code is invalid (i.e. missing a end somewhere), it won’t pass this test.

Putting these into a test framework like Travis-CI is so very easy, especially since it’s a RubyGem, and I have set up environment variables to test against specific versions of Chef.
This provides the fast-feedback loop that tests my code against a matrix of Ruby & Chef versions.

So there you have it. A long explanation of why I wrote something. I had looked around, and there’s a knife crawl that is meant to walk a given role’s dependency tree and provide that, but that only worked for a single role, and wasn’t focused on visualizing.

So I wrote my own. Hope you like it, and happy to take pull requests that make sense, and bug reports for things that don’t.

You can find the gem on RubyGems.org – via gem install knife-role-spaghetti or on my GitHub account.

I’m very curious to know what other people’s role spaghetti looks like, so drop me a line, tweet, comment or such with your pictures!

Quick edit: A couple of examples, showing what this does.

Sample Roles

(full resolution here)

Running through the neato renderer (with the -N switch) produces this image:

Sample Roles Neato

(full resolution here