Tracking application performance on Heroku with Datadog

I thought about using a clickbait title – “You’ll never believe how this guy captures metrics!” – but decided that 99% of these are not worth the time invested in coming up with the catch title.

So instead, I’ll simply talk about what I wanted to, and you be the judge of my title.

Application Performance Monitoring, or APM, is a crazily complex landscape, with an enormous amount of tooling, terminology, and providers looking to get some piece of the action.
There are many vendors, and all have their advantages, as well as disadvantages.

The vendor that I am pretty happy with (and I now work there) is Datadog.

One solution that has caught on quite well for surgical application monitoring is the use of the statsd protocol to send metrics from inside your application to a listener which can then store these metrics for querying later on. This is achieved by placing strategic “emitter” callouts in your code so that they can report metrics during runtime.

Flickr, then Etsy have started these projects, and they have been refined, ported to most languages, and are seeing adoption in companies where a focus on measuring is an important goal.
A blog post on Datadog’s implementation and extension of Statsd was written last year and goes into deeper detail.

One common question has always been “How do I collect metrics from an application running on Heroku with Datadog?”.

And I think we finally have one answer.

The Heroku Dyno container is pretty simple – you wanna run a process? Describe it in a Procfile.
You wanna scale? You tell Heroku to launch more Dynos with the process name, as specified in the Procfile.

However, the actual Dyno is a fairly limited environment by design – the root filesystem is read-only, the only writable area is in the application’s root directory, and disappears when terminated. There’s no sysvinit, upstart or systemd for people to bicker about. Use a Procfile, which is also really simple.

So a challenge to overcome became: “how to install a Datadog Agent package that runs a dogstatsd listener as a second process, inside an environment that is pretty locked down?”

First, we have to install the package. Heroku has a concept of “[buildpacks]”(https://devcenter.heroku.com/articles/buildpacks) that can be used to run compilation steps before adding your application code and launching it. The use of multiple buildpacks is also available, to chain steps together to achieve the desired outcome.

I read the heroku-buildpack-apt and found a bunch of good ideas, and came up with a Datadog-Agent-specific installer buildpack that drops off the package, as well as the needed environment for the runtime.

Now how do I run the listener process alongside my application?

Enter foreman. Foreman, not to be confused with “theforeman“, has long been a great way for application developers writing Heroku-targeted applications to run them locally in a similar manner that they will be run on the remote platform.

Foreman reads the Profile, and runs the processes based on the directives contained inside.

This feature is the one that we leverage to run multiple processes on a single Dyno.

By using foreman inside the Dyno, we are able to tell foreman to run more than one process type at a time, with another Procfile that specifies the startup process for the actual application as well as the dogstatsd listener.

When deploying any code revision, Heroku will read the base Procfile, and run a foreman process inside the Dyno, which will in turn, start up the app & dogstasd.

And while foreman is a Ruby gem, your project may be in Python (use honcho), Go (use forego or goreman) and I’m sure there are others out there. I haven’t found or tested all of them, tell me if they work out for you.

I did, however, take the time to write up a README with the procedure to follow to use this, as well as commit-by-commit example application.

Here’s the buildpack code: http://miketheman.github.io/heroku-buildpack-datadog/

Here’s the example application: https://github.com/miketheman/buildpack-example-ruby

Here’s an image of the stats collected by the example application in Datadog, with increasing web load:
Heroku App Load

Here’s a random dog:

Hope this helps you find deeper insight into how you monitor your applications!

Update (2014-12-15)

A quick addition on this topic.

A couple of days after this was published, I had a short Twitter exchange with Bo Jeanes, after which he submitted a Pull Request to the buildpack, (as well as an update to the example app).
This simplifies the end-user’s deployment of the Agent package, in that the user no longer has to spend any time on doing Procfile-in-Procfile solutions, as well as remove the need from foreman and the like from inside the container, rather the dogstatsd process will be started via the profile.d mechanism which is run on Dyno startup.

This makes the solution even more elegant, so thanks a ton, Bo!

A Quick Drop Into Data Structures For A Minute

So here’s the story, from A to Z…

Well, I’m not going to all the way to Z, but let me lay some details on you.

At Datadog, we provide a nice interface for configuring the Datadog Agent – it’s usually pretty simple to drop some YAML configuration into a file at a specific location, restart the Agent main process, and voilà, you’ve got monitoring.

This gets more complicated when you want to generate a valid YAML file from another system, typically from something like Configuration Management, where you want to take the notion of “Things I know about this particular system” should then trigger “monitor this system with the things I know about it”.

In the popular open source config management system Chef, it is a common practice to create a template of the file you wish to place on a given system, and then extract particular variables to pass to a template ‘resource’, and use those as dynamic values that can make the template reusable across systems and projects, as the template itself can be populated by inputs not included in the initial template design.

Another concept in Chef is the ability to set node ‘attributes’ to control the behavior of recipes, templates and any amount of resources. This has pros and cons, neither of which I will attempt to cover here, but suffice it to say that the pattern is well-established that if you want to share your resources with others, having a mechanism of “tweaking the knobs” of your resources with attributes is a common way of doing it.

In the datadog cookbook for Chef, we provide an interface just like this. An end user can build up a list of structured data entries made up of hash objects (or maps or dicts, depending on your favorite language), and then pass that into a node object, and expect that these details will be rendered into a configuration file template (and restart the service, etc).

This allows the end user to take the code, not modify it at all, and provide inputs to it to receive the desired state.

Jumping further into Chef’s handling of node attributes now.

== Attribute
Attribute implements a nested key-value (Hash) and flat collection
(Array) data structure supporting multiple levels of precedence, such
that a given key may have multiple values internally, but will only
return the highest precedence value when reading.

Attributes are subclassed of the Mash object type – which has some cool features, like deep-merging lower data structures – and then attributes are compiled together to make collections of these node attribute objects, which are then “frozen” into another class type named Chef::Node::ImmutableArray or Chef::Node::ImmutableHash to prevent further mucking around with them.

All this is cool so far, and is really useful in most cases.

In my case, I want to allow the user to provide the data needed, and then have the data written our, or deserialized, into a configuration file, which can then be read by the Agent process.

The simple way you might think to do this is to tell the YAML module of Ruby’s standard library (which is actually an alias to the Psych module) to emit the structured YAML and be done with it.

In an Erubis (ERB) template, this would look like this:

<%= YAML.dump(array_of_mash_data) %>

However, I’d like to inject a header to the array before rendering it, so I’ll do that first:

<%= YAML.dump({ 'instances' => array_of_mash_data }) %>

What this does is render a file like so:

---
instances:
- !ruby/hash:Mash
  host: localhost
  port: 9999
  extra_key: extra_val
  conf:
  - !ruby/hash:Mash
    include: !ruby/hash:Mash
      domain: org.apache.cassandra.db
      attributes:
      - BloomFilterDiskSpaceUsed
      - Capacity
      foo: bar
    exclude:
    - !ruby/hash:Mash
      domain: evil_domain

As you can see, there’s these pesky lines that include a special YAML-oriented tag that start with exclamation points – !ruby/hash:Mash – these are there to describe the data structure to any YAML loader, saying “hey, the thing you’re about to load is an instance of XYZ, not an array, hash, string or integer”.

Unfortunately, when parsing this file from the Python side of things to load it in the Agent, we get some unhappiness:

$ sudo service datadog-agent configcheck
your.yaml contains errors:
    could not determine a constructor for the tag '!ruby/hash:Mash'
  in "<byte string>", line 7, column 5

So it’s pretty apparent that I can do one of two things:

  • teach Python how to interpret a Ruby + Mash constructor
  • figure out how to remove these from being rendered

The latter seemed most likely, since I didn’t really want to teach Python anything new, especially since this is really a Hash (or a dict, in pythonese).

So I experimented with taking items from the Mash, and running them through a built-in method to_hash – which seemed likely to work.

Not really.

<%= YAML.dump({ 'instances' => @instances.map { |item| item.to_hash }}) %>

That code only steps into the first layer of the data structure and converts the segment starting with host: localhost into a Hash, but the sub-keys remain Mash objects. Grr.

Digging around, I found other reported problems where people have extended Chef objects with some interesting methods trying to solve the same problem.

This means that I’d have to add library code to my project, then modify the template renderer to make the helper code available, then tell the template to render it using these subclassed methods, and then have to worry about it.

ARGH.

Instead, I tried another tactic, which seems to have worked out pretty well.

Instead of trying to walk any size of a data structure and attempt to catch every leaf of the tree, I turned instead to another mechanism to “strip” out the Ruby-specific data structure details, and keep the same structure, so I used the ol’ faithful – JSON.

By using built-ins to convert the Mash to a JSON string, then have the JSON library parse it back into a datastructure, and then serialize it to YAML, we remove all of the extras from the picture, leaving us with a slightly modified ERB method:

<%= JSON.parse(({ 'instances' => @instances }).to_json).to_yaml %>

I then took to benchmarking both methods to see if there would be any significant impact on performance for doing this. Details are over here. Short story: not much impact.

So I’m pretty happy with the way this turned out, and even if I’m moving objects back and forth between serialization formats, the end result is something the next program (Datadog Agent) can consume.

Hope you enjoyed!

On the passage of time and learning

It’s been just over two years since I first wrote a little tool to help me visualize the relationships between objects in a particular system.

I had been working as a consultant for a couple of companies, and I found that all exhibited similar problems of using a powerful system, creating ad-hoc relationships where needed, and not fully following the inheritance and impact of these relationships when they change.

So coming and trying to first understand what was there, and then trying to untangle things to be clearer (and hopefully better), I tried to sit down and draw out in a physical space – probably a whiteboard – all of the objects, their relationships, and “who talks to whom” diagram.

Sidebar: diagrams and visualizations are awesome. A picture is many times worth a thousand words, which is why using pictures and visual representations of hard-to-perceive patterns is key to helping others understand what you may already know.

I quickly realized a few things that were problematic with this manual approach:

  1. There were too many objects and relationships to express effectively and clearly on a whiteboard.
  2. Every time something changed in the objects or their relationships, I had to modify the diagram or start over.
  3. This is probably not the last time I’m going to have this problem, and I’m getting really good at drawing boxes with arrows.

With these things in mind, I sat down and tried to reverse-engineer my own thought process. I knew what kind of visual end result I wanted, so I started by using an open source library that helps place things in relationship to other things, and then renders that as an image.

Once I was able to manually generate the image based on the input I provided, then the focus was to use dynamic input, which was the big win, as then I could point this at any input, and get a picture rendered.

Next was packaging and testing, which became harder and harder – but I kept going and eventually was happy with the results.

There have been over 750 downloads of that first version, and I’ve tweaked a few things here and there over time, but haven’t really done much to change the actual code to incorporate any further features.

“It works, I’m done.”

Looking back at the code I wrote (all told – less than 100 lines!), I realize that if I wanted to change behavior today, it’s much harder to do, as the code itself doesn’t lend itself to be changed.

I hadn’t written any testing around the code itself, only functional tests around “if I press START, do I reach END correctly?” approaches – sometimes termed “Outside-In” testing, as the test will assert that from the outside, everything looks groovy.

These tests are slower, not as comprehensive, as trying to have a test system look at a rendered image and compare it with a known “good” one isn’t trivial either. Some libraries exist, but what if I change the assumptions of what a “good” image is? Update the comparison image? Too much work, says the lazy person in me.

So the code exists, and works, and continues to function, over time.

I take a look at it recently, and realize that it’s all one big function (also known as a ‘method’). And some measurement tools out there state that the method is simply too complex.

How can it be too complex? This method is less than 70 lines of actual code, it can’t be that complex, can it?

In the time since I’ve written this code, I’ve learned a lot, heard a lot, failed a lot, and written a lot more code, and thanks to untold amounts of other people, I’ve been getting a bit better at it.

Here’s where I’ll drop in a reference to Sandi Metz, author of POODR, and more, and talk she gave earlier this year, and I didn’t see in person, only on Youtube. It’s called “All the Little Things”, in which she takes you on a journey of looking at code, refactoring and testing, and basically how to change things to make further changes easy.

It’s a load of information, and a lot of it may not make sense if you’ve never encountered these problems and ideas. But having these ideas (and other design principles and patterns) in your toolbox enable forward progress in your own understanding of how you approach solving problems is really helpful in not only solving the problem today, but helping you solve the problems you don’t even know about yet.

Now I look at that code, and say to myself, “Wow, I don’t really want to change anything in there until I have some better testing around parts of it”. This makes it harder to add anything new, since I don’t know what existing functionality I may break when adding new things.

So if you wrote some code, and let it sit for a while, and look at it a year or two later, you may find yourself shaking your head, with the “who even wrote this mess?” knowing full well that your past self did it.

Be kind to your future self, and try to make decisions today that will help your future self understand what choices you made and why you made them. It’s likely that your future self will have learned more by then and may make other decisions, but will appreciate the efforts of present self in the future.
It’s a weird kind of time-travel, and in the present, you’re trying to better your own future. (cue time paradox arguments)

Thanks for reading!