Tracking application performance on Heroku with Datadog

I thought about using a clickbait title – “You’ll never believe how this guy captures metrics!” – but decided that 99% of these are not worth the time invested in coming up with the catch title.

So instead, I’ll simply talk about what I wanted to, and you be the judge of my title.

Application Performance Monitoring, or APM, is a crazily complex landscape, with an enormous amount of tooling, terminology, and providers looking to get some piece of the action.
There are many vendors, and all have their advantages, as well as disadvantages.

The vendor that I am pretty happy with (and I now work there) is Datadog.

One solution that has caught on quite well for surgical application monitoring is the use of the statsd protocol to send metrics from inside your application to a listener which can then store these metrics for querying later on. This is achieved by placing strategic “emitter” callouts in your code so that they can report metrics during runtime.

Flickr, then Etsy have started these projects, and they have been refined, ported to most languages, and are seeing adoption in companies where a focus on measuring is an important goal.
A blog post on Datadog’s implementation and extension of Statsd was written last year and goes into deeper detail.

One common question has always been “How do I collect metrics from an application running on Heroku with Datadog?”.

And I think we finally have one answer.

The Heroku Dyno container is pretty simple – you wanna run a process? Describe it in a Procfile.
You wanna scale? You tell Heroku to launch more Dynos with the process name, as specified in the Procfile.

However, the actual Dyno is a fairly limited environment by design – the root filesystem is read-only, the only writable area is in the application’s root directory, and disappears when terminated. There’s no sysvinit, upstart or systemd for people to bicker about. Use a Procfile, which is also really simple.

So a challenge to overcome became: “how to install a Datadog Agent package that runs a dogstatsd listener as a second process, inside an environment that is pretty locked down?”

First, we have to install the package. Heroku has a concept of “[buildpacks]”(https://devcenter.heroku.com/articles/buildpacks) that can be used to run compilation steps before adding your application code and launching it. The use of multiple buildpacks is also available, to chain steps together to achieve the desired outcome.

I read the heroku-buildpack-apt and found a bunch of good ideas, and came up with a Datadog-Agent-specific installer buildpack that drops off the package, as well as the needed environment for the runtime.

Now how do I run the listener process alongside my application?

Enter foreman. Foreman, not to be confused with “theforeman“, has long been a great way for application developers writing Heroku-targeted applications to run them locally in a similar manner that they will be run on the remote platform.

Foreman reads the Profile, and runs the processes based on the directives contained inside.

This feature is the one that we leverage to run multiple processes on a single Dyno.

By using foreman inside the Dyno, we are able to tell foreman to run more than one process type at a time, with another Procfile that specifies the startup process for the actual application as well as the dogstatsd listener.

When deploying any code revision, Heroku will read the base Procfile, and run a foreman process inside the Dyno, which will in turn, start up the app & dogstasd.

And while foreman is a Ruby gem, your project may be in Python (use honcho), Go (use forego or goreman) and I’m sure there are others out there. I haven’t found or tested all of them, tell me if they work out for you.

I did, however, take the time to write up a README with the procedure to follow to use this, as well as commit-by-commit example application.

Here’s the buildpack code: http://miketheman.github.io/heroku-buildpack-datadog/

Here’s the example application: https://github.com/miketheman/buildpack-example-ruby

Here’s an image of the stats collected by the example application in Datadog, with increasing web load:
Heroku App Load

Here’s a random dog:

Hope this helps you find deeper insight into how you monitor your applications!

Update (2014-12-15)

A quick addition on this topic.

A couple of days after this was published, I had a short Twitter exchange with Bo Jeanes, after which he submitted a Pull Request to the buildpack, (as well as an update to the example app).
This simplifies the end-user’s deployment of the Agent package, in that the user no longer has to spend any time on doing Procfile-in-Procfile solutions, as well as remove the need from foreman and the like from inside the container, rather the dogstatsd process will be started via the profile.d mechanism which is run on Dyno startup.

This makes the solution even more elegant, so thanks a ton, Bo!

A Quick Drop Into Data Structures For A Minute

So here’s the story, from A to Z…

Well, I’m not going to all the way to Z, but let me lay some details on you.

At Datadog, we provide a nice interface for configuring the Datadog Agent – it’s usually pretty simple to drop some YAML configuration into a file at a specific location, restart the Agent main process, and voilà, you’ve got monitoring.

This gets more complicated when you want to generate a valid YAML file from another system, typically from something like Configuration Management, where you want to take the notion of “Things I know about this particular system” should then trigger “monitor this system with the things I know about it”.

In the popular open source config management system Chef, it is a common practice to create a template of the file you wish to place on a given system, and then extract particular variables to pass to a template ‘resource’, and use those as dynamic values that can make the template reusable across systems and projects, as the template itself can be populated by inputs not included in the initial template design.

Another concept in Chef is the ability to set node ‘attributes’ to control the behavior of recipes, templates and any amount of resources. This has pros and cons, neither of which I will attempt to cover here, but suffice it to say that the pattern is well-established that if you want to share your resources with others, having a mechanism of “tweaking the knobs” of your resources with attributes is a common way of doing it.

In the datadog cookbook for Chef, we provide an interface just like this. An end user can build up a list of structured data entries made up of hash objects (or maps or dicts, depending on your favorite language), and then pass that into a node object, and expect that these details will be rendered into a configuration file template (and restart the service, etc).

This allows the end user to take the code, not modify it at all, and provide inputs to it to receive the desired state.

Jumping further into Chef’s handling of node attributes now.

== Attribute
Attribute implements a nested key-value (Hash) and flat collection
(Array) data structure supporting multiple levels of precedence, such
that a given key may have multiple values internally, but will only
return the highest precedence value when reading.

Attributes are subclassed of the Mash object type – which has some cool features, like deep-merging lower data structures – and then attributes are compiled together to make collections of these node attribute objects, which are then “frozen” into another class type named Chef::Node::ImmutableArray or Chef::Node::ImmutableHash to prevent further mucking around with them.

All this is cool so far, and is really useful in most cases.

In my case, I want to allow the user to provide the data needed, and then have the data written our, or deserialized, into a configuration file, which can then be read by the Agent process.

The simple way you might think to do this is to tell the YAML module of Ruby’s standard library (which is actually an alias to the Psych module) to emit the structured YAML and be done with it.

In an Erubis (ERB) template, this would look like this:

<%= YAML.dump(array_of_mash_data) %>

However, I’d like to inject a header to the array before rendering it, so I’ll do that first:

<%= YAML.dump({ 'instances' => array_of_mash_data }) %>

What this does is render a file like so:

---
instances:
- !ruby/hash:Mash
  host: localhost
  port: 9999
  extra_key: extra_val
  conf:
  - !ruby/hash:Mash
    include: !ruby/hash:Mash
      domain: org.apache.cassandra.db
      attributes:
      - BloomFilterDiskSpaceUsed
      - Capacity
      foo: bar
    exclude:
    - !ruby/hash:Mash
      domain: evil_domain

As you can see, there’s these pesky lines that include a special YAML-oriented tag that start with exclamation points – !ruby/hash:Mash – these are there to describe the data structure to any YAML loader, saying “hey, the thing you’re about to load is an instance of XYZ, not an array, hash, string or integer”.

Unfortunately, when parsing this file from the Python side of things to load it in the Agent, we get some unhappiness:

$ sudo service datadog-agent configcheck
your.yaml contains errors:
    could not determine a constructor for the tag '!ruby/hash:Mash'
  in "<byte string>", line 7, column 5

So it’s pretty apparent that I can do one of two things:

  • teach Python how to interpret a Ruby + Mash constructor
  • figure out how to remove these from being rendered

The latter seemed most likely, since I didn’t really want to teach Python anything new, especially since this is really a Hash (or a dict, in pythonese).

So I experimented with taking items from the Mash, and running them through a built-in method to_hash – which seemed likely to work.

Not really.

<%= YAML.dump({ 'instances' => @instances.map { |item| item.to_hash }}) %>

That code only steps into the first layer of the data structure and converts the segment starting with host: localhost into a Hash, but the sub-keys remain Mash objects. Grr.

Digging around, I found other reported problems where people have extended Chef objects with some interesting methods trying to solve the same problem.

This means that I’d have to add library code to my project, then modify the template renderer to make the helper code available, then tell the template to render it using these subclassed methods, and then have to worry about it.

ARGH.

Instead, I tried another tactic, which seems to have worked out pretty well.

Instead of trying to walk any size of a data structure and attempt to catch every leaf of the tree, I turned instead to another mechanism to “strip” out the Ruby-specific data structure details, and keep the same structure, so I used the ol’ faithful – JSON.

By using built-ins to convert the Mash to a JSON string, then have the JSON library parse it back into a datastructure, and then serialize it to YAML, we remove all of the extras from the picture, leaving us with a slightly modified ERB method:

<%= JSON.parse(({ 'instances' => @instances }).to_json).to_yaml %>

I then took to benchmarking both methods to see if there would be any significant impact on performance for doing this. Details are over here. Short story: not much impact.

So I’m pretty happy with the way this turned out, and even if I’m moving objects back and forth between serialization formats, the end result is something the next program (Datadog Agent) can consume.

Hope you enjoyed!

On the passage of time and learning

It’s been just over two years since I first wrote a little tool to help me visualize the relationships between objects in a particular system.

I had been working as a consultant for a couple of companies, and I found that all exhibited similar problems of using a powerful system, creating ad-hoc relationships where needed, and not fully following the inheritance and impact of these relationships when they change.

So coming and trying to first understand what was there, and then trying to untangle things to be clearer (and hopefully better), I tried to sit down and draw out in a physical space – probably a whiteboard – all of the objects, their relationships, and “who talks to whom” diagram.

Sidebar: diagrams and visualizations are awesome. A picture is many times worth a thousand words, which is why using pictures and visual representations of hard-to-perceive patterns is key to helping others understand what you may already know.

I quickly realized a few things that were problematic with this manual approach:

  1. There were too many objects and relationships to express effectively and clearly on a whiteboard.
  2. Every time something changed in the objects or their relationships, I had to modify the diagram or start over.
  3. This is probably not the last time I’m going to have this problem, and I’m getting really good at drawing boxes with arrows.

With these things in mind, I sat down and tried to reverse-engineer my own thought process. I knew what kind of visual end result I wanted, so I started by using an open source library that helps place things in relationship to other things, and then renders that as an image.

Once I was able to manually generate the image based on the input I provided, then the focus was to use dynamic input, which was the big win, as then I could point this at any input, and get a picture rendered.

Next was packaging and testing, which became harder and harder – but I kept going and eventually was happy with the results.

There have been over 750 downloads of that first version, and I’ve tweaked a few things here and there over time, but haven’t really done much to change the actual code to incorporate any further features.

“It works, I’m done.”

Looking back at the code I wrote (all told – less than 100 lines!), I realize that if I wanted to change behavior today, it’s much harder to do, as the code itself doesn’t lend itself to be changed.

I hadn’t written any testing around the code itself, only functional tests around “if I press START, do I reach END correctly?” approaches – sometimes termed “Outside-In” testing, as the test will assert that from the outside, everything looks groovy.

These tests are slower, not as comprehensive, as trying to have a test system look at a rendered image and compare it with a known “good” one isn’t trivial either. Some libraries exist, but what if I change the assumptions of what a “good” image is? Update the comparison image? Too much work, says the lazy person in me.

So the code exists, and works, and continues to function, over time.

I take a look at it recently, and realize that it’s all one big function (also known as a ‘method’). And some measurement tools out there state that the method is simply too complex.

How can it be too complex? This method is less than 70 lines of actual code, it can’t be that complex, can it?

In the time since I’ve written this code, I’ve learned a lot, heard a lot, failed a lot, and written a lot more code, and thanks to untold amounts of other people, I’ve been getting a bit better at it.

Here’s where I’ll drop in a reference to Sandi Metz, author of POODR, and more, and talk she gave earlier this year, and I didn’t see in person, only on Youtube. It’s called “All the Little Things”, in which she takes you on a journey of looking at code, refactoring and testing, and basically how to change things to make further changes easy.

It’s a load of information, and a lot of it may not make sense if you’ve never encountered these problems and ideas. But having these ideas (and other design principles and patterns) in your toolbox enable forward progress in your own understanding of how you approach solving problems is really helpful in not only solving the problem today, but helping you solve the problems you don’t even know about yet.

Now I look at that code, and say to myself, “Wow, I don’t really want to change anything in there until I have some better testing around parts of it”. This makes it harder to add anything new, since I don’t know what existing functionality I may break when adding new things.

So if you wrote some code, and let it sit for a while, and look at it a year or two later, you may find yourself shaking your head, with the “who even wrote this mess?” knowing full well that your past self did it.

Be kind to your future self, and try to make decisions today that will help your future self understand what choices you made and why you made them. It’s likely that your future self will have learned more by then and may make other decisions, but will appreciate the efforts of present self in the future.
It’s a weird kind of time-travel, and in the present, you’re trying to better your own future. (cue time paradox arguments)

Thanks for reading!

A decade of writing this stuff? Seriously.

In tinkering with my blogging platform and playing with different technologies, I’ve just realized that I’ve been writing online for well over a decade now.

It started a long time ago, when I was writing personal stuffs on a public Open Diary back in 1995, under an alias, which for the life of me, I can’t recall. The site is currently unavailable, and I was curious to see if they still indexed old entries, and see if I could dig anything up from back then.

It was a place that I tossed out whatever I had in mind, a place to jot down the ideas running through my head, a place for a creative outlet with the safety of knowing nothing would ever come back to me, since I lived behind the veil of anonymity (since back then, PRISM was just a dream…), and I was able to express whatever I wanted, in a safe-like place.

After writing there for a couple of years, I was witness to the 1997 Ben Yehuda Street Bombing – I was at a cafe off the street with some friends, when it happened, and went to offer whatever help I could, having had some First Aid training. After spending some 2 hours dealing with things that I’ve pushed far to the back of my mind, I was gathered by a friend, carted to his house, and sat in shock for a few hours, before making my way home.

The next day, I wrote about it on OD, and referenced my friend by first name only.

A couple of days later, a comment came on my post, asking if my friend was ‘So-and-so from Jerusalem’, and if so, that they knew him, and agreed that he was a great help. We began discussing our mutual friend, and eventually met in person.

This was the first revelation I had – you’re never truly anonymous.

We became pals, hung out a few times, and continued to stay up to date with each other for a while.
I did notice that after a while, my writing dwindled, now I knew that there is someone out there who knew who I am, not that I was saying anything outrageous, but the feeling of freedom dropped.

During my time in the Air Force, I wrote extremely rarely, since getting online was near impossible from base, so after discharge in 2000, I pretty much had stopped writing altogether.

In 2003, my friend Josh Brown invited me to the closed community (at the time) of LiveJournal, where it quickly grew into the local social networking site, where we could post, comment, and basically keep up with each other’s lives.
Online quizzes were ‘the thing’ and posting your results as an embed to your post was The Thing to do.

After spending 4 years on LJ, they began providing additional customizations, added features for paid-only users, and I didn’t want to spend any money on that, rather I wanted to host my own site.

So I did, for a while. In 2006, I built my own WordPress 2.0 site (history!), hosted it on my home server (terrible bandwidth) and began on the journey of customized web application administration. Dealing with databases, application code updates, frameworks, plugins, you name it.

I think I actually enjoyed tinkering with the framework more than actually writing.

Anyhow, I’ve written sporadically over time, about a wide variety of things, both on this site, and elsewhere.
The invention of Facebook, twitter, and pretty much any social network content outlet has replaced a lot of the heavier topic writing that went on here.

But it does indeed fill we with some sense of happiness that I’ve been doing this for a long time, and have preserved whatever I could from 2003 until now, and continue to try and put out some ideas now and then.

My hope is that anyone can take the to express their creativity in whatever fashion they feel possible, and share what they want to with the rest of us.

The Importance of Dependency Testing

Recently I revisited an Open Source project I started over a year ago.

This tool is built to hook into a much larger framework (Chef), and leverages a bunch of code many other people have written, and produce a specific result that I was looking for.

This subject is less about the tool itself, rather the process and procedure involved in testing dependencies.

This project is written in Ruby, and as many have identified in articles and tweets, some project maintainers don’t adhere to a versioning policy, making it hard to ensure working software across multiple versions of dependencies.
A lot rides on the maintainer’s adherence to a versioning standard – one very popular one is Semantic Versioning, or SemVer for short.

This introduces a few other questions, like how frequently should a writer release new versions of code, how frequently should users upgrade to leverage new fixes, features, etc.

In any case, my tool was restricted to running the framework’s version 10.x, considering that between major versions, functionality may change, and that there is no guarantee that my tool will continue working.

A new major version of Chef was released earlier this year and most of my existing projects are still on Chef 10.x, as this is still being updated with stability fixes and security patches, and the ‘jump’ to 11 is not on the schedule right now, so my tool continues functioning just fine.

Time passes, and I have a project running Chef 11 that I want to use my tool with.

Whoops. There’s a constraint built in to the tool’s syntax of dependencies that will report that “you have Chef 11, this wants Chef 10.x and not higher, have a nice day”.

So I change the constraint, install locally, and see that it works. Yay!

Now I want to commit the change that I made to the version constraint logic, but I want to continue testing the tool against the 10.x versions, as I should continue to support the active versions for as long as they are alive and in use, right?

A practice I was using for the tests that I had written was: given a static list of Chef versions, use the static entry as the Chef version for installation/test.

This required me to update the static list each time a new version of Chef was released, and potentially was testing against versions that didn’t need testing – rather I wanted to test against the latest of the mainline release.

I updated my constraint, ran the test suite that I’ve written, and whoops, it failed the tests.

Functionality-wise, it worked correctly on both versions, so the problem must be in my test suite, right?

I found a cool project called Appraisal, that’s been around for a while, and
used by a bunch of other projects, and you can read more about it here.
It allows one to specify multiple version constraints and test against each of them with the same test suite.

Sure enough, passes on version 10, not version 11. Same code, same tests. #wat

So now it’s time to do some digging. I read through some of the Chef ChangeLog, and decide there’s too much to wade through there, rather let’s take a look at the code my tool is using.

The failure was triggering here, and was showing a default value.
This meant that Chef was no longer loading the configuration file that I provide here correctly.

So I took a look at the current version of the configuration loader, and visually compared it with the 10.x version.
Sure enough, there’s one small change that’s affecting me: Old vs New

working_directory? What’s this? Oh, it’s over here, just a few lines prior.

Reading the full commit, and the original ticket, it seems like this is indeed a good idea, but why are my tests failing?

After further digging around in the aruba test suite extension I’m using, I realize that the environment variable PWD remains set to the actual working directory of my shell, not the test suite’s subprocesses.
Thus every time it runs, the chef_config_dir is looking in my current directory, not the directory the tests are running in.

After poking around aruba’s source code, and adding some debugging statements during test runs, I figured out that I need the test suite to change it’s PWD environment variable based on the test’s execution, which led to this commit.

Why is this different? Well, before, Ruby’s Dir.pwd statement would be invoked from inside the running test, loading the config from a location relative to Dir.pwd, where I was placing the test config file.
Now the test was trying to load the config from the process’ environment variable PWD instead, and failing to find the config.

Tests, pass, and now I can have Travis CI continue to test my code with multiple dependencies when it changes and catch things before they go badly.

All in all, an odd behavior to expect in a normal situation, as my tool is mean to be run interactively by a user, not via a test suite that mocks up all sorts of other environments.

So I spent about 2-3 hours digging around to essentially change one line that makes things work better and cleaner than before.

Worth it? Completely, as these changes will allow me to continue to ensure that my tool remains working with upstream releases of the framework, and maintain compatibility with supported versions of the framework.

TL, DR: Don’t skimp on testing you project against multiple versions of external dependencies, especially when your target users are going to be using more than one possible version.

P.S. Shout out to my girlfriend that generously lets me spend time hacking on these kind of things 😀

The Ripple Effect of Choices

We live our lives in a chaotic universe.

Atoms crashing into each other, at crazy fast rates, hyper-vibrations and electric pulses being at the core of our bodies, all working in tandem to get us through our day.
Do they govern our choices? Do our choices govern them? Who is in charge here, anyways?

Many traditional organized religions may express that everything is at the will of a higher power or being, and that we are governed from above.
Some may express that everything is predetermined other than moral decisions, and that is what we are responsible for.

An interesting statement I heard at one point is that “anyone trying to reconcile divine predetermination and free will has a fool’s errand.”

I like to think that I make my own choices about all things, but the subsequent impact of any given choice is pretty much up in the air.

This morning before I left my house, I made a choice of what to wear. Since today I will be presenting a short talk on a professional topic, I chose khaki slacks instead of jeans.

Once I left the house, I had to choose which form of transportation I would take to the office – take a CitiBike, or take one of two trains.

Since I was dressed “nicely” and didn’t want to get all sweaty, the bike option was out. Leaving me with the trains.

Then the choice of which train, and walking towards one of them would bring me to a breakfast cafe where I like to get a morning sandwich sometimes, so I took that option.

I walked in, saw the LONG line for food, and chose to turn around and head to the train instead of waiting. It’s not that great a sandwich anyways.

So I turn the corner, it’s a small alleyway, and a lady is walking about 2 strides ahead of me. I notice that at one point, she pauses, and lets a car coming from ahead pass completely before resuming, and I realize it’s because they are coming at high speed, and there’s deep puddles, s she was avoiding the splash.

A few moments later, another car comes along, and she speeds up, I pause, and try to remain far enough from the splash zone.

Nope.

She looks back at me with a touch of concern, while I’m spattered with droplets from a puddle of unknown cleanliness, and wrings a wry smile, as I smile back at her and say “I should have done what you did!”

Will the water dry? Pretty sure it will. Will my slacks reamin presentable? Don’t know just yet, still drying right now. Did every choice up until this point bring me here? Yes. Was this the universe pushing me here? I don’t think I’ll ever know.
But it did provide me with a new set of ideas.

Years ago, Life Magazine featured a picture (I spent 20 minutes looking for it online, couldn’t find it) of a pedestrian leaping like a ninja to dodge a monsoon-like splash from a passing vehicle, and this experience immediately re-triggered that image. I now feel like I can further relate to that particular image better, probably reinforcing some neural passageway inside my brain of a long-term memory resurfacing.

It provided me with what might call a “New York experience” since this is probably not the first time this has happened in a small alleyway in NYC, and it most certainly won’t be the last.

This short experience also reminded me that you can’t “ever go back and change something”, despite having a Flux Capacitor. If if you could, would you? I wouldn’t.

It also brought to the forefront that all the choices I’m making today were driven in part by choices I made in the past – one fo which was the choice to speak in front of others.

Your life is made up of the set of choices, experiences and hopefully the subsequent knowledge gained from them. It’s kind of what makes you: You, and me: Me. That’s one reason (of many) why we’re all different.

In short, make good choices. Good is a subjective term to you at the time you have to make them. As long as you take a moment at think: “Is this the best choice I could make right now, given everything else I know from before?” then you’ll probably be ok. We make tons of these choices unknowingly every day.

Oh, and what’s good for the goose might not always be good for the gander.


P.S. This blog post was written during my first attempt at the Pomodoro Technique in a cafe. Dunno how I feel about it yet, but it sure worked to hammer out a post in a focused amount of time.

My foray into web development

I like browsing the web. I do it a lot throughout my day.

A lot of people work hard at making the web a cool-looking place. Some sites make simplicity look so easy, that when you look under the hood, it’s all chaos and destruction, folded and crunched together, all to present something really nice and smooth for the end-user.

I’m not a developer – much less a web dev. There’s a lot to know in any field of computing – and in web it’s pretty much the most visible part of computing as a whole, since pretty much anyone anywhere is going to use a web browser to view a site at some point.

I mean, sure, we all learned some HTML – hell, I wrote some sites back in the days of Geocities, and it was awesome to learn about tiling backgrounds of animated GIFs, and when CSS came around, minds blown!

And I left that field for the frontend developers, and went into infrastructure and operations.

And as time passes, you find yourself managing a variety of systems and knowledge, and at some point, you may say to yourself, “I wish I knew how to answer this question…”

And then you write some code to answer it. Voila! You’re a developer, of sorts.

I’m a huge fan of data visualization. Telling stories with pictures dates back millennia, and it’s very relatable to most people. Recently, I wrote a tool to help myself display the dependency complexity of Chef roles, and I found that, while being very useful, the output is very limited, as it’s a static generated image, whereas we live in a web-friendly world where everything is interactive and fun!

So when I came across another hard question I wanted to answer, I thought, “Why not make this a web application?”

This time, the question I wanted to answer was: As a GitHub Organization owner for my company, what human-to-team-to-software-repository relationships do we have, and are they secure?

If you’ve ever managed an Organization in GitHub, there are a few key elements.

  1. An Organization can have many Repositories
  2. An Organization can have many Teams
  3. A Repository can have many Teams
  4. A Team can have many members, but only one permission (read only, read/write, owner)

So sorting out who is on what Team, what access they have, across many repositories, can be a security nightmare. Especially when you have more than 4-5 repositories.

During my first foray into solving this, I cobbled together a command-line tool, using Ruby with the Graphviz library. I’ve like Graphviz for years – it’s straightforward, as structured text gets rendered into a graph and then can be output to a file.

Very straightforward, has some limitations, but basically allows you to store graphs as text, and re-render them when changes happen. Basically, it’s like storing source code and not the binary output.

But since there were some limitations, and I wanted this new question to be more than a command-line tool, something I could share with the world at large, without requiring any client-side installation of any tools or dependencies.

So I spent a lot of time hemming and hawing, looking at web frameworks and trying to figure out some of them, and “how does this work?” came up a lot.

Finally, yesterday I set out to sit down and accomplish this task. I sat in a Starbucks in New York City, and had a Venti. I started banging away at about 11:30. I took a break for a refill and a snack around 1:30, and when I sat down again, I kept hacking away until 9:30pm, when I deemed completion.

The code was written, tested by me locally, pushed to GitHub, deployed to Heroku, DNS name wired up and all. As soon as I completed, I left Starbucks, and heaved a huge sigh – it was one hell of a mental high, I was in “the zone” and had been there for a long time.

You are more than welcome to browse the source code here and the finished project here. I call it the GitHub Organization Viewer, hence “GOVweb”.

I have a bunch of other ideas on how to make this better, how to model the data, which visual style to use, but I think for now, I’m going to leave it for a bit, and see what I think about it in a couple of months.

But all in all, this reinforced my opinion to never be afraid to try tackling a new idea, a new project, a new field you’re unfamiliar with – as long as you can read, comprehend and learn, the world is your oyster.

A picture is worth a (few) thousand bytes

(Context alert: Know Chef. If you don’t, it’s seriously worth looking into for any level of infrastructure management.)

TL;DR: I wrote a Knife plugin to visualize Chef Role dependencies. It’s here.

Recently, I needed to sort out a large amount of roles and their dependencies, in order to simplify the lives of everyone using them.

It wasn’t easy to determine that changing one would affect many others, since it had become common practice to embed roles within other roles’ run_list, resulting in a tree of cross-dependency hell.
A node’s run_list would typically contain a single role-specific item, embedding the lower-level dependencies.

A sample may look like this:

node[web1] => run_list = role[webserver] => run_list = role[base], recipe[apache2], ...
node[db1] =>  run_list = role[database]  => run_list = role[base], recipe[mongodb], ...

Many of these roles had a fair amount of code duplication, and most were setting the same base role, as well as any role-specific recipes. Others were referencing the same recipes, so figuring out what to refactor and where, without breaking everything else, was more than challenging.

The approach I wanted to implement was to have a very generalized base role, apply that to every instance, then add any specific roles should be applied as well to a given node.

After refactoring node’s run list would typically look like:

node[web1] => run_list = role[base], role[webserver]
node[db1] =>  run_list = role[base], role[database]

A bit simpler, right?

This removes the embedded dependency on role[base], since the assumption is that every node with have role[base] applied to it, unless I don’t want to for some reason (some development environment for instance).

Trying to refactor this was pretty tricky, so I wrote a visualizer to collect all the roles from a Chef repository’s role_path, parse them out, and create an image.

I’ve used Graphviz for a number of years now, and it’s pretty general-purpose when it comes to creating graphs of things (nodes), connecting them (edges), and rendering an output. So this was my go-to for this project.

Selling you on the power of visualizing data is beyond the scope of this post (and probably the author), but suffice to say there’s industries built around putting data into visual format for a variety of reasons, such as relative comparison, trending, etc.
In fact some buddies of mine have built an awesome product that does just that – visualizes data and events over time. Check them out at Datadog. (I’ve written other stuff for their platform before, it’s totally awesome.)

In my case, I wanted the story told by the image to:

  1. Demonstrate the complexity of the connections between roles/recipes (aka spaghetti)
  2. Point out if I have any cyclic dependencies (it’s possible!)
  3. Let me focus on what to do next: untangle

Items 1 & 2 were pretty cool – my plugin spat out an increasingly complex graph, showing relationships that made sense for things to work, but also contained some items with 5-6 levels of inheritance that are easily muddled. I didn’t have any cyclic dependencies, so I created a sample one to see what it would look like. It looked like a circle.

Item 3 was harder, as this meant that human intervention needed to take place. It was almost like deciding on which area of a StarCraft map you want to go after first. There’s plenty of mining to do, but which will pay off fastest? (geeky references, are you surprised?)

I decided on some of the smaller clusterings, and made some progress, changing where certain role statements lived and the node <=> role assignment to refactor a lot out.

My process of writing a plugin developed pretty much like this:

  1. Have an idea of how I want to do this
  2. Write some code that when executed manually, does what I want
  3. Transform that code into a knife plugin, so it lives inside the Chef Ecosystem
  4. Package said plugin as RubyGem, to make distribution easy for others
  5. Test, test, test (more on this in a moment)
  6. Document (readme only for now)
  7. Add some features, rethink of how certain things are done, refactor.
  8. Test some more

Writing code, packaging and documentation are pretty standard practices (more or less), so I won’t go into those.

The more interesting part was figuring out how to plug into the Chef/Knife plugins architecture, and testing.

Thanks to Opscode, writing a plugin isn’t too hard, there’s a good wiki, and other plugins you can look at to get some ideas.

A couple of noteworthy items:

  1. Figuring out how to provide command-line arguments to OptionParser was not easy, since there was no real intuitive way to do it. I spent about 2 hours researching why that wasn’t doing what I wanted, and finally figured out that "--flag" and "--flag " behave completely different.

  2. During my initial cut of the code, I used many statements to print output back to the user (puts "some message"). In the knife plugin world, one should use the ui.info or ui.error and the like, as this makes it much cleaner and consistent with other knife commands.

Testing:

Since this is a command-line application plugin, it made sense to use a framework that can handle inputs and outputs, as that’s my primary concern.
With a background in systems administration and engineering, software testing has never been on the top of my to-learn list, so when the opportunity arose to write tests for another project I wrote, I turned to Cucumber, and the CLI extension Aruba.

Say what you will about unit tests vs integration tests vs functional tests – I got going relatively quickly writing tests in quasi-English.
I won’t say that it’s easy, but it definitely made me think about how the plugin will be used, how users may input commands differently, and what they can expect to happen when they run it.

Cucumber/Aruba also allowed me to split my tests in a way that I can grok, such as all the CLI-related commands, flags, options exist in one test ‘feature’ file, whereas another feature file contains all the tests of reading the roles and graphing them in different formats.

Writing tests early on allowed me to continue to capture how I thought the plugin will be used, write that down in English, and think about it for awhile.
Some things changed after I had written them down, and even then, after I figured out the tests, I decided that the behavior didn’t match what I thought would be most common.

Refactoring the code, running tests in between to ensure that the behavior that I wanted remained consistent was very valuable. This isn’t news for any software engineers out there, but it might be useful to more system people to learn more about testing.

Another test I use is a style-checker called tailor – it measures up my code, and reports on things that may be malformed. This is the first test I run, as if the code is invalid (i.e. missing a end somewhere), it won’t pass this test.

Putting these into a test framework like Travis-CI is so very easy, especially since it’s a RubyGem, and I have set up environment variables to test against specific versions of Chef.
This provides the fast-feedback loop that tests my code against a matrix of Ruby & Chef versions.

So there you have it. A long explanation of why I wrote something. I had looked around, and there’s a knife crawl that is meant to walk a given role’s dependency tree and provide that, but that only worked for a single role, and wasn’t focused on visualizing.

So I wrote my own. Hope you like it, and happy to take pull requests that make sense, and bug reports for things that don’t.

You can find the gem on RubyGems.org – via gem install knife-role-spaghetti or on my GitHub account.

I’m very curious to know what other people’s role spaghetti looks like, so drop me a line, tweet, comment or such with your pictures!

Quick edit: A couple of examples, showing what this does.

Sample Roles

(full resolution here)

Running through the neato renderer (with the -N switch) produces this image:

Sample Roles Neato

(full resolution here

Recruiting via LinkedIn – Don’t Do This!

I regularly get emails from recruiters all over the planet, telling me about their awesome new technology, latest and greatest ideas, and why I should work for them.

Most get ignored.

One came in this week that annoyed me, since it was from someone at a company that had sent me the exact same email six months ago.

I felt I had to respond:

Hi <recruiter name>,

I think heard of <YourCompany> last year sometime from a friend.

I also received this same stock email from you on 8/22/11, and you had addressed it to “Pascal” – further evidence of a copy-and-paste.

It would behoove you to keep records of whom you contact, as well as reviewing the message you paste before clicking “Send”.

A stock recruiter email is not a very likely way to attract good recruits, especially if you’re listing a ton of things that are not particularly relevant or interesting in the realm of technology.

Asking me to send a resume, while being able to view my full LinkedIn profile also seems superfluous – here’s the information, you have supposedly read it, and that is what attracted you to my profile in the first place, rather than “someone who turned up in a keyword search”.

I wish you, and your company all the best, and hope that these recruiting tactics work for you.

All the best,
-M

I am very curious what kind of response, if any, I shall get.

Chatting with a robot

Here I am, sitting calmly, trying to figure out the reasoning for the universe, and I get a GChat notification that someone wants to chat with me.

Here’s the transcript:

10:29:12 AM caitlyn ball: hi
10:29:17 AM miketheman: hi
10:29:24 AM caitlyn ball: hey whats up? 22/F here. you?
[email protected] is now known as caitlyn ball. (10:29:27 AM)
10:29:41 AM miketheman: totally bored.
10:29:49 AM caitlyn ball: hmm. have we chatted before?
10:30:15 AM miketheman: probably not, since you just added me to your list
10:30:24 AM caitlyn ball: oh ok. i wasnt sure. anyways.. whats up?
10:30:35 AM miketheman: not much, working. you?
10:30:45 AM caitlyn ball: im like so boreddd…. there is nothing to do
10:31:00 AM caitlyn ball: ohhh wait! i got a great idea. have you ever watched a sexy girl like me strip live on a cam before?
10:31:18 AM miketheman: no, I don’t believe that I have.
10:31:25 AM miketheman: And that seems to be a great idea.
10:31:29 AM caitlyn ball: wellllll….. you could watch me strip if you would like?
10:31:43 AM miketheman: possibly.
10:31:56 AM miketheman: Or we could discuss the nature of the desire for people to watch other people remove their clothing
10:32:00 AM caitlyn ball: yeah? ok well my cam is setup through this website so that i cant be recorded so you have to signup there.
10:32:09 AM caitlyn ball: it only takes a minute and it is free. ok?
10:32:13 AM miketheman: That doesn’t seem likely.
10:32:28 AM caitlyn ball: http://<removed> go there then at the top of the page click on the goldish JOIN FREE button.
10:32:33 AM caitlyn ball: k?
10:33:00 AM miketheman: Are you sure you don’t want to debate the reasoning behind the attraction with exposed bodies?
10:33:16 AM caitlyn ball: also it does ask for a credit card but thats how they keep kids out. it does not charge the card. k?
10:33:34 AM miketheman: Of course it does. Are there wizards with hats on the site as well?
10:33:50 AM caitlyn ball: ok babe well hurry up and when u get logged in then u can view my cam and we can have some fun!
10:34:03 AM caitlyn ball: i also have some toys but u have to tip me some gold or join me in private to see those.
10:34:08 AM miketheman: Again, probably not going to happen.
10:34:19 AM caitlyn ball: hey lets talk on there babe. my messenger is messing up.
10:34:36 AM miketheman: I believe you completely.
10:34:52 AM miketheman: You must have reached the end of your loop.
10:35:03 AM miketheman: Bye!

So it was a fun little distraction, and the the URL provided resolves to <obviously> girlcamz [net] – haven’t visited, since there’s no point, really.

I was hoping the bot would be a little better than a simple responder to the next input. But alas. Developers of sex marketing spam bots are probably less inclined to put some real engineering efforts into their crap.