Container-to-Container Communication

Question ❓

In a containerized world, is there a material difference between communicating over local network TCP vs local Unix domain sockets?

Given an application with more than a single container that need to talk to each other, is there an observable difference in latency/throughput when using one inter-component communication method over another from an end-users’ perspective?


Background 🌆

There’s this excellent write-up on the comparison back in 2005, and many things have changed since then, especially around the optimizations in the kernel and networking stack, along with the container runtime that is usually abstracted away from the end user’s concerns. Redis benchmarks from a few years ago also point out significant improvements using Unix sockets when the server and benchmark are co-located.

There’s other studies out there that have their own performance comparisons, and produce images like these – and every example is going to have its own set of controls and caveats.

I wanted to use a common-ish scenario: a web service running on cloud infrastructure I don’t own.

Components 🧩

For the experiment, I chose this set of components:

  • nginx (web server) – terminate SSL, proxy requests to upstream web server
  • gunicorn (http server) – speaks HTTP and WSGI protocol, runs application
  • starlette (python application framework) – handle request/response
components

I considered using FastAPI for the application layer – but since I didn’t need any of those features, I didn’t add it, but it’s a great framework – check it out!

As gunicorn server runs the starlette framework and the custom application code, I will be referring to them as a single component later as "app", as the tests I’m comparing is the behavior between nginx and the "app" layer, using overall user-facing latency and throughput as the main result.

nginx 🌐

nginx is awesome. Really powerful, and has many built-in features, highly configurable. Been using it for years, and it’s my go-to choice for a reliable web server.

For our purposes, we need an external port to listen for inbound requests, and a stanza to proxy the requests to the upstream application server.

You might ask: Why use nginx at all, if Gunicorn can terminate connections directly? Well, there’s often a class of problems that nginx is better suited at handling rather than a fully-fledged Python runtime – examples include static file serving (robots.txt, favicon.ico et. al.) as well as caching, header or path rewriting, and more.

nginx is a commonly used in front of all manner of applications.

Python Application 🐍

To support the testing of a real-world scenario, I’m creating a JSON response, as that’s how most web applications communicate today. This often incurs some serialization overhead in the application.

I took the example from starlette and added a couple of tweaks to emit the current timestamp and a random number. This prevents any potential caching occurring in any of the layers and polluting the experiment.

Here’s what the main request/response now looks like:

async def homepage(request):
    return JSONResponse(
        {
            "hello": "world",
            "utcnow": datetime.datetime.utcnow().isoformat(),
            "random": random.random(),
        }
    )

A response looks like this:

{
  "hello": "world",
  "utcnow": "2021-12-27T00:31:42.383861",
  "random": 0.5352573557347882
}

And while there are ways to improve JSON serialization speed, or tweak the Python runtime, I wanted to keep the experiment with defaults, since the point isn’t about maximizing total throughput, rather seeing the difference between the architectures.

Cloud Environment ☁️

For this experiment, I chose Amazon Elastic Container Service (ECS) with AWS Fargate compute. These choices provide a way to construct all the pieces needed in a repeatable fashion in the shortest amount of time, and abstract a lot of the infra concerns. To set everything up, I used AWS Copilot CLI, an open-source tool that does even more of the heavy lifting for me.

The Copilot Application type of Load Balanced Web Service will create an Application Load Balancer (ALB), which is the main external component outside my application stack, but an important one for actual scaling, SSL termination at the edge, and more. For the sake of this experiment, we assume (possibly incorrectly!) that ALBs will perform consistently for each test.

Architectures 🏛

Using containers, I wanted to test multiple architecture combinations to see which one proved the "best" when it came to user-facing performance.

Example 1: "tcp"

The communication between nginx container and the app container takes places over the dedicated network created by the Docker runtime (or Container Network Interface in Fargate). This means there’s TCP overhead between nginx and the app – but is it significant? Let’s find out!

Example 2: "sharedvolume"

Here we create a shared volume between the nginx container and the app container. Then we use a Unix domain socket to communicate between the containers using the shared volume.

This architecture maintains a separation of concerns between the two components, which is generally a good practice, so as to have a single essential process per container.

Example 3: "combined"

In this example, we combine both nginx and app in a single container, and use local Unix sockets within the container to communicate.

The main difference here is that we add a process supervisor to run both nginx and app runtimes – which some may consider an anti-pattern. I’m including it for the purpose of the experiment, mainly to uncover if there’s performance variation between a local volume and a shared volume.

This approach simulates what we’d expect in a single "server" scenario – where a traditional instance (hardware or virtual) runs multiple processes and all have some access to a local shared volume for inter-process communication (IPC).

To make this a fair comparison, I’ve also doubled the CPU and memory allocation.

Copilot ✈️

Time to get off the ground.

Copilot CLI assumes you already have an app prepared in a Dockerfile. The Quickstart has you clone a repo with a sample app – so instead I’ve created a Dockerfile for each of the architectures, along with a docker-compose.yml file for local orchestration of the components.

Then I’ll be able to launch and test each one in AWS with its own isolated set of resources – VPC, networking stack, and more.

I’m not going into all the details of how to install Copilot and launch the services, for that, read the Copilot CLI documentation (linked above), and read the experiment code.

This test is using AWS Copilot CLI v1.13.0.

Test Protocol 🔬

There’s an ever-growing list of tools and approaches to benchmark web request/response performance.

For the sake of time, I’ll use a single one here, to focus on the comparison of the server-side architecture performance.

All client-side requests will be performed from an AWS CloudShell instance running in the same AWS Region as the running services (us-east-1) to isolate a lot of potential network chatter. It’s not a perfect isolation of potential variables, but it’ll have to do.

To baseline, I ran each test locally (see later).

Apache Bench

Apache Bench, or ab, is a common tool for testing web endpoints, and is not specific to Apache httpd servers. I’m using: Version 2.3 <$Revision: 1879490 $>

I chose single concurrency, and ran 1,000 requests. I also ignore variable length, as the app can respond with a variable-length random number choice, and ab considers different length responses a failure unless specified.

ab -n 1000 -c 1 -l http://service-target....

Each test should take less than 5 seconds.

The important stats I’m comparing are:

  • Requests per second (mean) – higher is better
  • Time per request (mean) – lower is better
  • Duration at 99th percentile. 99% of all requests completed within (milliseconds) – lower is better

To reduce variance, I also "warmed up" the container by running the test for a larger amount of requests

Local Test

To establish a baseline, I ran the same benchmark test against the local services. Using Docker Desktop 4.3.2 (72729) on macOS. These aren’t demonstrative of a real user experience, but provides a sense of performance before launching the architectures in the cloud.

arch reqs per sec ms per req 99th pctile
tcp (local) 679.77 1.471 2
sharedvolume (local) 715.62 1.397 2
combined (local) 705.55 1.871 2

In the local benchmark, the clear loser is the tcp architecture, and the sharedvolume has a slight edge on combined – but not a huge win. No real difference in the 99th percentiles – requests are being served in under 2ms.

This shows that the shared resources for the combined architecture are near the performance of the sharedvolume – possibly due to Docker Desktop’s bridging and network abstraction. A better comparison might be tested on a native Linux machine.

Remote Test

Once I ran through the setup steps using Copilot CLI to create the environment and services, I performed the same ab test, and collected the results in this table:

arch reqs per sec ms per req 99th pctile
tcp (aws) 447.57 2.234 5
sharedvolume (aws) 394.55 2.535 6
combined (aws) 428.60 2.333 4

With the remote tests, minor surprise that the combined service performed better than the sharedvolume service, as in the local test it performed worse.

The bigger surprise was to find that the tcp architecture wins slightly over the socket-based architectures.

This could be due to the way ECS Fargate uses the Firecracker microvm, and has tuned the network stack to perform faster than using a shared socket on a volume when communicating between two containers on the same host machine. The best part is – as a consumer of a utility, I don’t care, as long as it’s performing well!

ARM/Graviton Remote Test

With the Copilot manifest defaults for the Intel x86 platform, let’s also test the performance on the linux/arm64 platform (Graviton2, probably).

For this to work, I had to rebuild the nginx sidecars manually, as Copilot doesn’t yet build&push sidecar images. I also had to update the manifest.yml to set the desired platform, and deploy the service with copilot svc deploy .... (The combined version needed some Dockerfile surgery too…)

Results:

arch reqs per sec ms per req 99th pctile
tcp (aws/arm) 475.03 2.105 3
sharedvolume (aws/arm) 451.71 2.214 4
combined (aws/arm) 433.94 2.304 4

We can see that all the stats are better on the Graviton architecture, lending some more credibility to studies done by other benchmark posts and papers.

Aside: The linux/arm64-based container images were tens of megabytes smaller, so if space and network pull time is a concern, these will be a few microseconds faster.

Other Testing Tools

If you’re interested in performing longer tests, or emulating different user types, check out some of these other benchmark tools I considered and didn’t use for this experiment:

  • Python – https://locust.io/ https://molotov.readthedocs.io/
  • JavaScript – https://k6.io/
  • Golang – https://github.com/rakyll/hey
  • C – https://github.com/wg/wrk

There’s also plenty of vendors that build out extensive load testing platforms – I’m not covering any of them here. If you run a test with these, would definitely like to see your results!

Conclusions 💡

Using the Copilot CLI wasn’t without some missteps – the team is hard at work improving the documentation, and are pretty responsive in both their GitHub Issues and Discussions, as well as their Gitter chat room – always helpful when learning a new framework. Once I got the basics, being able to establish a reproducible stack is valuable to the experimentation process, as I was able to provision and tear down the stack easily, as well as update with changes relatively easily.

Remember: these are micro-benchmarks, on not highly-tuned environments or real-world workloads. This test was designed to test a very specific type of workload, which may change as more concurrency is introduced, CPU or memory saturation is achieved, auto-scaling of application instances comes into play, and more.

Your mileage may vary.

When I started this experiment, I assumed the winner would be a socket-based communication architecture (sharedvol or combined), from existing literature, and it also made sense to me. The overhead of creating TCP packets between the processes would be eliminated, and thus performance would be better.

However, in these benchmarks, I found that using the TCP communication architecture performs best, possibly due to optimizations beyond our view in the underlying stack. This is precisely what I want from an infrastructure vendor – for them to figure out how to optimize performance without having to re-architect an application to perform better in a given deployment scenario.

The main conclusion I’ve drawn is: Using TCP to communicate between containers is best, as it affords the most flexibility, follows established patterns, and performs slightly better than the alternatives in a real(ish) world scenario. And if you can, use Graviton2 (ARM) CPU architecture.

Go forth, test your own scenarios, and let me know what you come up with. (Don’t forget to delete your resource when done!! 💸 )

AWS DeepComposer 🎹➡️☁️🎶

This year’s Amazon Web Services re:Invent conference in Las Vega, Nevada, was a veritable smorgasbord of announcements, product launches, previews, and a ton of information to try and digest at once.

One very exciting announcement was AWS DeepComposer – which continues to expand on AWS’ mission of “Putting machine learning in the hands of every developer”.
Here’s a slick intro video from the product announcement – come back after!

The service is still in Preview mode, and has an application/review process – so while I wait for the application to clear, I figured I’d poke around a bit and see what I got.

📦 Box Contents

The box. Not super impressive.
The box, open. More impressive.

Opening the box, I’m immediately reminded of a 1980s Casio Keyboard – we had one, and I enjoyed it a lot. This is larger, has no batteries or speakers.

The keyboard itself.

It’s a 32-key keyboard, while the key sizing isn’t 100% the same as that baby grand piano you have tucked somewhere in your vast mansion, it’ll probably be good enough.

The interface is USB Type B. I recently recycled roughly over 20 of these cables in an e-waste purge, thinking “I don’t have anything that uses this connection!” Well, now I do. It’s 2019 – I thought at least Micro USB, if not USB-C would have been the right choice?

Lucky for me, the box also contains a USB-A to USB-B cable, so at least that’s that.
Wait a minute… my 12-inch MacBook from 2016 that I’m using only has a single USB-C port.
Ruh-roh.
Apparently, I packed my USB-A to USB-C plug that I got with my Google Pixel 4 – let’s see if that will work! Even if it does, that means that I can’t use the DeepComposer and charge my laptop at the same time without an external port hub.
Considering that’s the only port (other than a 3.5mm audio jack) on my mac, I’m not too worried about it, especially since the battery is still pretty good.

There’s other packing materials, and a little card with a nice tagline of “Press play on ML” and a URL to visit: https://aws.amazon.com/startcomposing (redirects to the product page link – maybe a future device-specific landing page? Hmmm…)

⚡️ Power it up

I know I don’t have the provisioned account access yet, so I won’t be able to run all the things the presenter did in the video, so I figured I might poke around the connectivity interface and see what I might be able to glean in the absence of a proper setup.

Before I plug in the device, let’s also look at the current state of the Input/Output (I/O) devices, filtered specifically to the Apple USB Host Controller:

$ ioreg -w0 -rc AppleUSBHostController
+-o XHC1@14000000  <class AppleUSBXHCISPTLP, id 0x1000001dd, registered, matched, active, busy 0 (5263 ms), retain 55>
  | {
  |   "IOClass" = "AppleUSBXHCISPTLP"
  |   "kUSBSleepPortCurrentLimit" = 1500
  |   "IOPowerManagement" = {"ChildrenPowerState"=1,"DevicePowerState"=0,"CurrentPowerState"=1,"CapabilityFlags"=4,"MaxPowerState"=3,"DriverPowerState"=0}
  |   "IOProviderClass" = "IOPCIDevice"
  |   "IOProbeScore" = 1000
  |   "UsbRTD3Supported" = Yes
  |   "locationID" = 335544320
  |   "name" = <"XHC1">
  |   "64bit" = Yes
  |   "kUSBWakePortCurrentLimit" = 1500
  |   "IOPCIPauseCompatible" = Yes
  |   "device-properties" = {"acpi-device"="IOACPIPlatformDevice is not serializable","acpi-path"="IOACPIPlane:/_SB/PCI0@0/XHC1@140000"}
  |   "IOPCIPrimaryMatch" = "0x9d2f8086"
  |   "IOMatchCategory" = "IODefaultMatchCategory"
  |   "CFBundleIdentifier" = "com.apple.driver.usb.AppleUSBXHCIPCI"
  |   "Revision" = <0003>
  |   "IOGeneralInterest" = "IOCommand is not serializable"
  |   "IOPCITunnelCompatible" = Yes
  |   "controller-statistics" = {"kControllerStatIOCount"=78,"kControllerStatPowerStateTime"={"kPowerStateOff"="142ms (0%)","kPowerStateSleep"="40191894ms (99%)","kPowerStateOn"="75024ms (0%)","kPowerStateSuspended"="1332ms (0%)"},"kControllerStatSpuriousInterruptCount"=0}
  |   "kUSBSleepSupported" = Yes
  | }
  |
  +-o HS01@14100000  <class AppleUSB20XHCIPort, id 0x100000245, registered, matched, active, busy 0 (4773 ms), retain 13>
  +-o HS03@14200000  <class AppleUSB20XHCIPort, id 0x100000246, registered, matched, active, busy 0 (0 ms), retain 10>
  +-o HS04@14300000  <class AppleUSB20XHCIPort, id 0x100000249, registered, matched, active, busy 0 (0 ms), retain 10>
  +-o HS09@14400000  <class AppleUSB20XHCIPort, id 0x10000024c, registered, matched, active, busy 0 (0 ms), retain 9>
  +-o SSP1@14500000  <class AppleUSB30XHCIPort, id 0x10000024d, registered, matched, active, busy 0 (0 ms), retain 14>
  +-o SSP3@14600000  <class AppleUSB30XHCIPort, id 0x10000024e, registered, matched, active, busy 0 (0 ms), retain 12>
  +-o SSP4@14700000  <class AppleUSB30XHCIPort, id 0x10000024f, registered, matched, active, busy 0 (0 ms), retain 12>

A shorter version of this can be seen in the built-in System Information app, under the USB section.

Now I’m ready – let’s see what happens!

Plugging in, the first positive indication is that I see a series of red and blue LEDs briefly light up behind the top row of buttons, a quick cycle. So we know that at the very least, the little adapter is providing some power to the USB device.

Let’s look at the output of the I/O device state now:

$ ioreg -w0 -rc AppleUSBHostController
+-o XHC1@14000000  <class AppleUSBXHCISPTLP, id 0x1000001dd, registered, matched, active, busy 0 (7030 ms), retain 60>
  | {
  |   "IOClass" = "AppleUSBXHCISPTLP"
  |   "kUSBSleepPortCurrentLimit" = 1500
  |   "IOPowerManagement" = {"ChildrenPowerState"=3,"DevicePowerState"=2,"CurrentPowerState"=3,"CapabilityFlags"=32768,"MaxPowerState"=3,"DriverPowerState"=0}
  |   "IOProviderClass" = "IOPCIDevice"
  |   "IOProbeScore" = 1000
  |   "UsbRTD3Supported" = Yes
  |   "locationID" = 335544320
  |   "name" = <"XHC1">
  |   "64bit" = Yes
  |   "kUSBWakePortCurrentLimit" = 1500
  |   "IOPCIPauseCompatible" = Yes
  |   "device-properties" = {"acpi-device"="IOACPIPlatformDevice is not serializable","acpi-path"="IOACPIPlane:/_SB/PCI0@0/XHC1@140000"}
  |   "IOPCIPrimaryMatch" = "0x9d2f8086"
  |   "IOMatchCategory" = "IODefaultMatchCategory"
  |   "CFBundleIdentifier" = "com.apple.driver.usb.AppleUSBXHCIPCI"
  |   "Revision" = <0003>
  |   "IOGeneralInterest" = "IOCommand is not serializable"
  |   "IOPCITunnelCompatible" = Yes
  |   "controller-statistics" = {"kControllerStatIOCount"=104,"kControllerStatPowerStateTime"={"kPowerStateOff"="142ms (0%)","kPowerStateSleep"="40554314ms (99%)","kPowerStateOn"="245721ms (0%)","kPowerStateSuspended"="1333ms (0%)"},"kControllerStatSpuriousInterruptCount"=0}
  |   "kUSBSleepSupported" = Yes
  | }
  |
  +-o HS01@14100000  <class AppleUSB20XHCIPort, id 0x100000245, registered, matched, active, busy 0 (6540 ms), retain 18>
  | +-o AKM322@14100000  <class IOUSBHostDevice, id 0x100004670, registered, matched, active, busy 0 (1766 ms), retain 23>
  |   +-o AppleUSBHostLegacyClient  <class AppleUSBHostLegacyClient, id 0x100004673, !registered, !matched, active, busy 0, retain 9>
  |   +-o AppleUSBHostCompositeDevice  <class AppleUSBHostCompositeDevice, id 0x10000467b, !registered, !matched, active, busy 0, retain 4>
  |   +-o IOUSBHostInterface@0  <class IOUSBHostInterface, id 0x10000467d, registered, matched, active, busy 0 (3 ms), retain 6>
  |   +-o IOUSBHostInterface@1  <class IOUSBHostInterface, id 0x10000467e, registered, matched, active, busy 0 (3 ms), retain 6>
  +-o HS03@14200000  <class AppleUSB20XHCIPort, id 0x100000246, registered, matched, active, busy 0 (0 ms), retain 10>
  +-o HS04@14300000  <class AppleUSB20XHCIPort, id 0x100000249, registered, matched, active, busy 0 (0 ms), retain 10>
  +-o HS09@14400000  <class AppleUSB20XHCIPort, id 0x10000024c, registered, matched, active, busy 0 (0 ms), retain 9>
  +-o SSP1@14500000  <class AppleUSB30XHCIPort, id 0x10000024d, registered, matched, active, busy 0 (0 ms), retain 14>
  +-o SSP3@14600000  <class AppleUSB30XHCIPort, id 0x10000024e, registered, matched, active, busy 0 (0 ms), retain 12>
  +-o SSP4@14700000  <class AppleUSB30XHCIPort, id 0x10000024f, registered, matched, active, busy 0 (0 ms), retain 12>

Again, this is pretty verbose, but if you look closely, you’ll see that the device at address HS01@14100000 now has a sub-device associated with it – AKM322@14100000.

Yay! We can see that the device is powered, and the system registers it.

What is this thing??

A quick search for the device prefix string “AKM322” brought be to a device similar in nature:
https://www.amazon.com/midiplus-32-Key-Keyboard-Controller-AKM322/dp/B016O5F2GQ
Here’s the listing for the DeepComposer device: https://www.amazon.com/AWS-DeepComposer-learning-enabled-keyboard-developers/dp/B07YGZ4V5B/

If you’re asking – “why the price difference?”, well the DeepComposer device comes with some cloud features too!

We want you to know:
To train your models and create new musical compositions, AWS DeepComposer is priced at $99, this includes the keyboard, plus a 3-month free trial of AWS DeepComposer services to train your models and create original musical compositions. Each month of the free trail includes enough to cover up to 4 training jobs and 40 inference jobs per month, during the free trial period.

So for the dollar value, you’re getting not only the device, but also some AWS Cloud Goodness!

Visiting what appears to be the manufacturer’s page, we can see more details about the hardware, so that’s cool. It’s a MIDI device, translating analog signals (like pressing keys for with different pressures and durations) into digital signals.
Cool stuff! There might be some secret AWS goodness in the DeepComposer model – we’ll have to wait and see.

Make some noise!!

Again, I don’t yet have access to the DeepComposer interface, so I found a macOS MIDI testing guide that I followed: https://support.apple.com/en-gb/HT201840

The test was successful, but I only got a single note “ding” response, confirming that the device works, can communicate back to my computer. But I want to hear something!

Apple produces Logic Pro – but at a $199 price tag, I don’t really want to spend that just to mess around until I can really try out the DeepComposer service.
Apple also produces GarageBand – for free! Fire it up, and wait for the 2GB download to complete over hotel wifi. This is also where I unplug the keyboard, and plug in the power – since we’re going to be here for a while…

I’ll check back once I’ve got some more details to report. Hope you enjoyed this set of musings, and hopefully I’ll have more to show you soon!

Other Reading

There’s not too much out there just yet – as this is a preview service, just announced.
I posted a link to a video of the original announcement, and you can also read some of the announcement blog post details here:

https://aws.amazon.com/blogs/aws/aws-deepcomposer-compose-music-with-generative-machine-learning-models/

Setting Up a Datadog-to-AWS Integration

When approaching a new service provider, sometimes it can be confusing on how to get set up to best communicate with them – some processes involve multiple steps, multiple interfaces, confusing terminology, and

Amazon Web Services is an amazing cloud services provider, and in order to allow access informational services inside a customer’s account, a couple of known mechanisms exist to delegate access:

  • Account Keys, where you generate a key and secret and share them. The other party stores these (usually in either clear text or using reversible encryption) and uses them as needed to make API calls
  • Role Delegation, where you create a Role and shared secret to provide to a the external service provider, who then is allowed to use their own internal security credentials to request temporary access to your account’s resources via API calls

In the former model, the keys are exchanged once, and once out of your immediate domain, you have little idea what happens to them.
In the latter, a rule is put into place that requires ongoing authenticated access to request assumption of a known role with a shared secret.

Luckily, in both scenarios, a restrictive IAM Policy is in place that allows only the actions you’ve decided to allow ahead of time.

Setting up the desired access is made simpler by having good documentation on how to do this manually. In this modern era, we likely want to keep our infrastructure as code where possible, as well as have a mechanism to apply the rules and test later if they are still valid.

Here’s a quick example I cooked up using Terraform, a new, popular tool to compose cloud infrastructure as code and execute to create the desired state.

# Read more about variables and how to override them here:
# https://www.terraform.io/docs/configuration/variables.html
variable "aws_region" {
type = "string"
default = "us-east-1"
}
variable "shared_secret" {
type = "string"
default = "SOOPERSEKRET"
}
provider "aws" {
region = "${var.aws_region}"
}
resource "aws_iam_policy" "dd_integration_policy" {
name = "DatadogAWSIntegrationPolicy"
path = "/"
description = "DatadogAWSIntegrationPolicy"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:Describe*",
"cloudtrail:DescribeTrails",
"cloudtrail:GetTrailStatus",
"cloudwatch:Describe*",
"cloudwatch:Get*",
"cloudwatch:List*",
"ec2:Describe*",
"ec2:Get*",
"ecs:Describe*",
"ecs:List*",
"elasticache:Describe*",
"elasticache:List*",
"elasticloadbalancing:Describe*",
"elasticmapreduce:List*",
"iam:Get*",
"iam:List*",
"kinesis:Get*",
"kinesis:List*",
"kinesis:Describe*",
"logs:Get*",
"logs:Describe*",
"logs:TestMetricFilter",
"rds:Describe*",
"rds:List*",
"route53:List*",
"s3:GetBucketTagging",
"ses:Get*",
"ses:List*",
"sns:List*",
"sns:Publish",
"sqs:GetQueueAttributes",
"sqs:ListQueues",
"sqs:ReceiveMessage"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_role" "dd_integration_role" {
name = "DatadogAWSIntegrationRole"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::464622532012:root" },
"Action": "sts:AssumeRole",
"Condition": { "StringEquals": { "sts:ExternalId": "${var.shared_secret}" } }
}
}
EOF
}
resource "aws_iam_policy_attachment" "allow_dd_role" {
name = "Allow Datadog PolicyAccess via Role"
roles = ["${aws_iam_role.dd_integration_role.name}"]
policy_arn = "${aws_iam_policy.dd_integration_policy.arn}"
}
output "AWS Account ID" {
value = "${aws_iam_role.dd_integration_role.arn}"
}
output "AWS Role Name" {
value = "${aws_iam_role.dd_integration_role.name}"
}
output "AWS External ID" {
value = "${var.shared_secret}"
}

The output should look a lot like this:

The Account ID is actually a full ARN, and you can copy your Account ID from there.
Terraform doesn’t have a mechanism to emit only the Account ID yet – so if you have some ideas, contribute!

Use the Account ID, Role Name and External ID and paste those into the Datadog Integrations dialog, after selecting Role Delegation. This will immediately validate that the permissions are correct, and return an error otherwise.

Don’t forget to click “Install Integration” when you’re done (it’s at the very bottom of the screen).

Now metrics and events will be collected by Datadog from any allowed AWS services, and you can keep this setup instruction in any revision system of your choice.

P.S. I tried to set this up via CloudFormation (Sparkleformation, too!). I ended up writing it “freehand” and took more than 3 times as long to get similar functionality.

You can see the CloudFormation Stack here, and decide which works for you.


Further reading:

There’s a New Player in Town, named Habitat

You may have heard some buzz around the launch of Chef‘s new open source project Habitat (still in beta), designed to change a bit of how we think about building and delivering software applications in the modern age.

There’s a lot of press, video announcement, and even a Food Fight Show where we got to chat with some of the brains behind the framework, and get into some of the nitty-gritty details.

In the vibrant Slack channel where a lot of the fast-paced discussion happens with a bunch of the core habitat developers, a community member had brought up a pain point, as many do.
They were trying to build a Python application, and had to result to playing pretty hard with either the PYTHONPATH variable or with sys.path post-dependency install.
One even used Virtualenv inside the isolated environment.

I had worked on making an LLVM compiler package, and while notoriously slow to compile on my laptop, I used the waiting time to get a Python web application working.

My setup is OSX 10.11.5, with Docker (native) 1.12.0-rc2 (almost out of beta!).

I decided to use the Flask web framework to carry out a Hello World, as it would prove a few of pieces:

  • Using Python to install dependencies using pip
  • Adding “local” code into a package
  • Importing the Python package in the app code
  • Executing the custom binary that the Flask package installs

Key element: it needed to be as simple as possible, but no simpler.

On my main machine, I wrote my application.
It listens on port 5000, and responds with a simple phrase.
Yay, I wrote a website.

Then I set about to packaging it into a deliverable where, in habitat’s nomenclature, it becomes a self-contained package, which can then be run via the habitat supervisor.

This all starts with getting the habitat executable, conveniently named hab.
A recent addition to the Homebrew Casks family, installing habitat was as simple as:

$ brew cask install hab

habitat version 0.7.0 is in use during the authoring of this article.

I sat down, wrote a plan.sh file, that describes how to put the pieces together.

There’s a bunch of phases in the build cycle that are fully customizable, or “stub-able” if you don’t want them to take the default action.
Some details were garnered from here, despite my package not being a binary.

Once I got my package built, it was a matter of figuring out how to run it, and one of the default modes is to export the entire thing as a Docker image, so I set about to run that, to get a feel for the iterative development cycle of making the application work as configured within the habitat universe.

(This step usually isn’t the best one for regular application development, but it is good for figuring out what needs to be configured and how.)

# In first OSX shell
$ hab studio enter
[1][default:/src:0]# build
...
   python-hello: Build time: 0m36s
[2][default:/src:0]# hab pkg export docker miketheman/python-hello
...
Successfully built 2d2740a182fb
[3][default:/src:0]#

# In another OSX shell:
$ docker run -it -p 5000:5000 -p 9631:9631 miketheman/python-hello
hab-sup(MN): Starting miketheman/python-hello
hab-sup(GS): Supervisor 172.17.0.3: cb719c1e-0cac-432a-8d86-afb676c3cf7f
hab-sup(GS): Census python-hello.default: 19b7533a-66ba-4c6f-b6b7-c011abd7dbe1
hab-sup(GS): Starting inbound gossip listener
hab-sup(GS): Starting outbound gossip distributor
hab-sup(GS): Starting gossip failure detector
hab-sup(CN): Starting census health adjuster
python-hello(SV): Starting
python-hello(O):  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

# In a third shell, or use a browser:
$ curl http://localhost:5000
Hello, World!

The code for this example can be found in this GitHub repo.
See the plan.sh and hooks/ for Habitat-related code.
The src/ directory is the actual Python app.

At this point, I declared success.

There’s a large amount of other pieces to the puzzle that I hadn’t explored yet, but getting this part running was the first one.
Items like interacting with the supervisor, director, healthchecks, topologies – these have some basic docs, but there’s not a bevy of examples or use cases yet to lean upon for inspiration.

During this process I uncovered a couple of bugs, submitted some feedback, and the team is very receptive so far.
There’s still a bunch of rough edges to be polished down, many around the documentation, use cases and how the pieces fit together, and what benefit it all drives.

There appears to be some hooks for using Chef Delivery as well – I haven’t seen those yet, as I don’t use Delivery.
I will likely try looking at making a larger strawman deployment to test these pieces another time.

I am looking forward to seeing how this space evolves, and what place habitat will take in the ever-growing, ever-evolving software development life-cycle, as well as how the community approaches these concepts and terminology.

My foray into web development

I like browsing the web. I do it a lot throughout my day.

A lot of people work hard at making the web a cool-looking place. Some sites make simplicity look so easy, that when you look under the hood, it’s all chaos and destruction, folded and crunched together, all to present something really nice and smooth for the end-user.

I’m not a developer – much less a web dev. There’s a lot to know in any field of computing – and in web it’s pretty much the most visible part of computing as a whole, since pretty much anyone anywhere is going to use a web browser to view a site at some point.

I mean, sure, we all learned some HTML – hell, I wrote some sites back in the days of Geocities, and it was awesome to learn about tiling backgrounds of animated GIFs, and when CSS came around, minds blown!

And I left that field for the frontend developers, and went into infrastructure and operations.

And as time passes, you find yourself managing a variety of systems and knowledge, and at some point, you may say to yourself, “I wish I knew how to answer this question…”

And then you write some code to answer it. Voila! You’re a developer, of sorts.

I’m a huge fan of data visualization. Telling stories with pictures dates back millennia, and it’s very relatable to most people. Recently, I wrote a tool to help myself display the dependency complexity of Chef roles, and I found that, while being very useful, the output is very limited, as it’s a static generated image, whereas we live in a web-friendly world where everything is interactive and fun!

So when I came across another hard question I wanted to answer, I thought, “Why not make this a web application?”

This time, the question I wanted to answer was: As a GitHub Organization owner for my company, what human-to-team-to-software-repository relationships do we have, and are they secure?

If you’ve ever managed an Organization in GitHub, there are a few key elements.

  1. An Organization can have many Repositories
  2. An Organization can have many Teams
  3. A Repository can have many Teams
  4. A Team can have many members, but only one permission (read only, read/write, owner)

So sorting out who is on what Team, what access they have, across many repositories, can be a security nightmare. Especially when you have more than 4-5 repositories.

During my first foray into solving this, I cobbled together a command-line tool, using Ruby with the Graphviz library. I’ve like Graphviz for years – it’s straightforward, as structured text gets rendered into a graph and then can be output to a file.

Very straightforward, has some limitations, but basically allows you to store graphs as text, and re-render them when changes happen. Basically, it’s like storing source code and not the binary output.

But since there were some limitations, and I wanted this new question to be more than a command-line tool, something I could share with the world at large, without requiring any client-side installation of any tools or dependencies.

So I spent a lot of time hemming and hawing, looking at web frameworks and trying to figure out some of them, and “how does this work?” came up a lot.

Finally, yesterday I set out to sit down and accomplish this task. I sat in a Starbucks in New York City, and had a Venti. I started banging away at about 11:30. I took a break for a refill and a snack around 1:30, and when I sat down again, I kept hacking away until 9:30pm, when I deemed completion.

The code was written, tested by me locally, pushed to GitHub, deployed to Heroku, DNS name wired up and all. As soon as I completed, I left Starbucks, and heaved a huge sigh – it was one hell of a mental high, I was in “the zone” and had been there for a long time.

You are more than welcome to browse the source code here and the finished project here. I call it the GitHub Organization Viewer, hence “GOVweb”.

I have a bunch of other ideas on how to make this better, how to model the data, which visual style to use, but I think for now, I’m going to leave it for a bit, and see what I think about it in a couple of months.

But all in all, this reinforced my opinion to never be afraid to try tackling a new idea, a new project, a new field you’re unfamiliar with – as long as you can read, comprehend and learn, the world is your oyster.

A picture is worth a (few) thousand bytes

(Context alert: Know Chef. If you don’t, it’s seriously worth looking into for any level of infrastructure management.)

TL;DR: I wrote a Knife plugin to visualize Chef Role dependencies. It’s here.

Recently, I needed to sort out a large amount of roles and their dependencies, in order to simplify the lives of everyone using them.

It wasn’t easy to determine that changing one would affect many others, since it had become common practice to embed roles within other roles’ run_list, resulting in a tree of cross-dependency hell.
A node’s run_list would typically contain a single role-specific item, embedding the lower-level dependencies.

A sample may look like this:

node[web1] => run_list = role[webserver] => run_list = role[base], recipe[apache2], ...
node[db1] =>  run_list = role[database]  => run_list = role[base], recipe[mongodb], ...

Many of these roles had a fair amount of code duplication, and most were setting the same base role, as well as any role-specific recipes. Others were referencing the same recipes, so figuring out what to refactor and where, without breaking everything else, was more than challenging.

The approach I wanted to implement was to have a very generalized base role, apply that to every instance, then add any specific roles should be applied as well to a given node.

After refactoring node’s run list would typically look like:

node[web1] => run_list = role[base], role[webserver]
node[db1] =>  run_list = role[base], role[database]

A bit simpler, right?

This removes the embedded dependency on role[base], since the assumption is that every node with have role[base] applied to it, unless I don’t want to for some reason (some development environment for instance).

Trying to refactor this was pretty tricky, so I wrote a visualizer to collect all the roles from a Chef repository’s role_path, parse them out, and create an image.

I’ve used Graphviz for a number of years now, and it’s pretty general-purpose when it comes to creating graphs of things (nodes), connecting them (edges), and rendering an output. So this was my go-to for this project.

Selling you on the power of visualizing data is beyond the scope of this post (and probably the author), but suffice to say there’s industries built around putting data into visual format for a variety of reasons, such as relative comparison, trending, etc.
In fact some buddies of mine have built an awesome product that does just that – visualizes data and events over time. Check them out at Datadog. (I’ve written other stuff for their platform before, it’s totally awesome.)

In my case, I wanted the story told by the image to:

  1. Demonstrate the complexity of the connections between roles/recipes (aka spaghetti)
  2. Point out if I have any cyclic dependencies (it’s possible!)
  3. Let me focus on what to do next: untangle

Items 1 & 2 were pretty cool – my plugin spat out an increasingly complex graph, showing relationships that made sense for things to work, but also contained some items with 5-6 levels of inheritance that are easily muddled. I didn’t have any cyclic dependencies, so I created a sample one to see what it would look like. It looked like a circle.

Item 3 was harder, as this meant that human intervention needed to take place. It was almost like deciding on which area of a StarCraft map you want to go after first. There’s plenty of mining to do, but which will pay off fastest? (geeky references, are you surprised?)

I decided on some of the smaller clusterings, and made some progress, changing where certain role statements lived and the node <=> role assignment to refactor a lot out.

My process of writing a plugin developed pretty much like this:

  1. Have an idea of how I want to do this
  2. Write some code that when executed manually, does what I want
  3. Transform that code into a knife plugin, so it lives inside the Chef Ecosystem
  4. Package said plugin as RubyGem, to make distribution easy for others
  5. Test, test, test (more on this in a moment)
  6. Document (readme only for now)
  7. Add some features, rethink of how certain things are done, refactor.
  8. Test some more

Writing code, packaging and documentation are pretty standard practices (more or less), so I won’t go into those.

The more interesting part was figuring out how to plug into the Chef/Knife plugins architecture, and testing.

Thanks to Opscode, writing a plugin isn’t too hard, there’s a good wiki, and other plugins you can look at to get some ideas.

A couple of noteworthy items:

  1. Figuring out how to provide command-line arguments to OptionParser was not easy, since there was no real intuitive way to do it. I spent about 2 hours researching why that wasn’t doing what I wanted, and finally figured out that "--flag" and "--flag " behave completely different.

  2. During my initial cut of the code, I used many statements to print output back to the user (puts "some message"). In the knife plugin world, one should use the ui.info or ui.error and the like, as this makes it much cleaner and consistent with other knife commands.

Testing:

Since this is a command-line application plugin, it made sense to use a framework that can handle inputs and outputs, as that’s my primary concern.
With a background in systems administration and engineering, software testing has never been on the top of my to-learn list, so when the opportunity arose to write tests for another project I wrote, I turned to Cucumber, and the CLI extension Aruba.

Say what you will about unit tests vs integration tests vs functional tests – I got going relatively quickly writing tests in quasi-English.
I won’t say that it’s easy, but it definitely made me think about how the plugin will be used, how users may input commands differently, and what they can expect to happen when they run it.

Cucumber/Aruba also allowed me to split my tests in a way that I can grok, such as all the CLI-related commands, flags, options exist in one test ‘feature’ file, whereas another feature file contains all the tests of reading the roles and graphing them in different formats.

Writing tests early on allowed me to continue to capture how I thought the plugin will be used, write that down in English, and think about it for awhile.
Some things changed after I had written them down, and even then, after I figured out the tests, I decided that the behavior didn’t match what I thought would be most common.

Refactoring the code, running tests in between to ensure that the behavior that I wanted remained consistent was very valuable. This isn’t news for any software engineers out there, but it might be useful to more system people to learn more about testing.

Another test I use is a style-checker called tailor – it measures up my code, and reports on things that may be malformed. This is the first test I run, as if the code is invalid (i.e. missing a end somewhere), it won’t pass this test.

Putting these into a test framework like Travis-CI is so very easy, especially since it’s a RubyGem, and I have set up environment variables to test against specific versions of Chef.
This provides the fast-feedback loop that tests my code against a matrix of Ruby & Chef versions.

So there you have it. A long explanation of why I wrote something. I had looked around, and there’s a knife crawl that is meant to walk a given role’s dependency tree and provide that, but that only worked for a single role, and wasn’t focused on visualizing.

So I wrote my own. Hope you like it, and happy to take pull requests that make sense, and bug reports for things that don’t.

You can find the gem on RubyGems.org – via gem install knife-role-spaghetti or on my GitHub account.

I’m very curious to know what other people’s role spaghetti looks like, so drop me a line, tweet, comment or such with your pictures!

Quick edit: A couple of examples, showing what this does.

Sample Roles

(full resolution here)

Running through the neato renderer (with the -N switch) produces this image:

Sample Roles Neato

(full resolution here

Ask your systems: “What’s going on?”

This is a sysadmin/devops-style post.
Disclaimers are that I work with these tools and people, and like what they do.

In some amount of our professional lives, we are tasked with bringing order to chaos, keep systems running and have the businesses we work for continue functioning.

In our modern days of large-scale computing, web technology growth explosions, multiple datacenter deployments, cloud providers and other virtualization technologies, the manpower needed to handle the vast amount of technologies, services and systems seems to have a pretty high overhead cost associated with it. “You’ve got X amount of servers? Let’s hire Y amount of sysadmins!”

A lot of tech startups start out with some of the developers performing a lot of the systems tasks, and since this isn’t always their core expertise, decisions are made, scripts are written, and “it works”.  When the team/systems grow large enough to need their own handler, in walks a system admin-style person, and may keel over, due to the state of affairs.

Yes, there are many tech companies where this is not the case, and I commend them of keeping their systems lean, mean and clean.

A lot of companies have figured out that in order to make the X:Y ratio work well, automation is required.  Here’s an article that covers some numbers from earlier this year.  I find that the statement of a ratio of 50 servers to 1 sysadmin pretty low on my view of how things can be, especially given the tools that we have available to us.

One of the popular systems configuration tools I’ve been using heavily is Chef, from Opscode. They provide a hosted solution, as well as an open-source version of their software, for anyone to use.  Getting up and running with some basics is really fast, and there’s a ton of information available, as well as a really responsive community (from mailing lists, bug tracker site and IRC channel).  Once you’re working with Chef, you may wonder how you ever got anything done before you had it.  It’s really treating a large part of your infrastructure as code – something readable, executable, and repeatable.

But this isn’t about getting started with Chef. It’s about “what’s next”.

In any decent starting-out tech company, the amount of servers used will typically range from 2-3 all the way to 200 – or even more.  If you’ve gone all the way to 200 without something like Chef or Puppet, I commend your efforts, and feel somewhat sorry for you.  Once you’re automating your systems creation, deployment and change, then you typically want some feedback on what’s going on. Did what I asked this system to do succeed, or did it fail.

Enter Datadog.

Datadog attempts to bring many sources of information together, to help whomever it is that is supposed to be looking at the systems to make more sense of the situation, from collecting metrics from systems, events from services and other sources, to allowing a timeline and newsfeed that is very human-friendly.

Having all the data at your disposal makes it easier to find patterns and correlations between events, systems and behaviors – helping to minimize the “what just happened?” question.

The Chef model for managing systems is a centralized server (either the open source in your environment or the hosted service in Opscode), which tells a server what it is meant to “be”.  Not what it is meant to “do now”, but the final state it should be in.  They call this model “idempotent” – meaning that no matter how many time you execute the same code on the same server, the behavior should end up the same every time.  But it doesn’t follow up very much on the results of the actions.

An analogy could be that every morning, before your kid leaves the house, your [wife|mother|husband|guardian|pet dragon] tells them “You should wear a coat today.” and then goes on their merry way, not checking whether they wore a coat or not. The next morning, there will get the same comment, and so on and so forth.

So how do we figure out what happened? Did the kid wear a hat or not? I suppose I could check by asking the kid and get the answer, but what if there are 200 of us? Do I have time to ask every kid whether or not they ended up wearing a hat? I’m going to be spending a lot of time dealing with this simple problem, I can tell you now.

Chef has built-in functionality to report on what Chef did – after it has received its instructions from the centralized server. It’s called the “Exception and Report Handlers” – and this is how I tie these two technologes together.

I adapted some code started by Adam Jacob @Opscode, and extended it further into a complete RubyGem with modifications for content, functionality and some rigorous testing.

Once the gem was ready, now I have to distribute it to my servers, and then have it execute every time Chef runs on that server. So, based on the chef_handler cookbook, I added a new recipe to the datadog cookbook – dd-handler.

What this does is adds the necessary components to a Chef execution, and when placed at the beginning of a “run”, will capture all the events and report back on the important ones to the Datadog newsfeed.  It will also push some metrics, like how long the Chef execution too, how many resources were updated, etc.

The process for getting this done was really quite simple, once you boil down all the reading, how’s and why’s – especially if you use git to version control your chef-repo.  The `knife cookbook site install` command is a great method for keeping your git repo “safe” for future releases, thus preserving your changes to the cookbook, allowing for merging of new code automatically. Read more here.

THE MOST IMPORTANT STUFF:

Here’s pretty much the process I used (under chef/knife version 0.10.x):

$ cd chef-repo
$ knife cookbook site install datadog
$ vi cookbooks/datadog/attributes/default.rb

At this point, I head over to Datadog, hit the “Setup” page, and grap my organization’s API Key, as well as create a new Application Key named “chef-handler” and copy the Hash that is created.

I place these two values into the `attributes/default.rb` file, save and close.

$ knife cookbook upload datadog

This places the cookbook on my Chef server, and is now ready to be referenced by a node or role. I use roles, as it’s much more manageable across multiple nodes.

I update the `common-node` role we have to include “recipe[datadog::dd-handler]” as one of the first receipes to execute in the run list.

The common-node role applies to all of our systems, and since they all run chef, I want them all to report on their progress.

And then let it run.

END MOST IMPORTANT STUFF

Since our chef-client runs on a 30 minute interval, and not all execute at the same time, this makes for some interesting graphs at the more recent time slices – not all the data comes in at the same time.  That’s something to get used to.

Here’s an image of a system’s dashboard with only the Chef metrics:

Single Instance dashboard
It displays a 24-hour period, and shows that this particular instance had a low variance in its execution time, as well as not much is being updated during this time (a good thing, since it is consistent).

On a test machine I tossed together, I created a failure, and here’s how it gets reported back to the newsfeed:

 

Testing a failure
As you can see, the stacktrace attempt to provide me with the information I need to diagnose and repair the issue. Once I fix it, and apache can start, this event was logged in the “Low Priority” section of the feed (since succeses are expected, and failures are aberrant behavior):

Test passes

All this is well and wonderful, but what about a bunch of systems? Well, I grabbed a couple snaps off the production environment for you!

These are aggregates I created with the graphing language (had never really read it before today!)

Production aggregate metrics

By being able to see the execution patterns, and a bump closer to the left side of the “Resource Updated” graph – I then investigated, and someone had deployed a new rsyslog package – so there was a temporary increase in deploying the resources, and now there are slightly more resources to manage overall.

The purple bump seen in the “Execution Time” graph led me to investigate, and found a timeout in that system’s call to an “apt-get update” request – probably the remote repo was unavailable for a minute. Having the data available to make that correlation made this task of investigating this problem really fast, easy, and simple – more importantly since it has been succeeding ever since, no cause for alarm.

So now I have these two technologies – Chef to tell the kids (the servers) to wear coats, and Datadog to tell the parents (me) if the kids wore the coats or not, and why.

Really, just wear a coat. It’s cold out there.

———–

Tested on:

  • CentOS 5.7 (x64), Ruby 1.9.2 v180, Chef 0.10.4
  • Ubuntu 10.04 (x64), Ruby 1.8.7 v352, Chef 0.9.18
Used:

You call this security? You’ve got to be kidding me

I just got off the phone with PayPal’s customer service department.

The reason I was on the phone in the first place – because you probably know how much I absolutely love talking to customer service representatives – is that I was trying to be a good Netizen.

I received a couple – not one – of emails originating from Paypal’s service for password resets. This is not a foreign thing, especially for someone that has his email address all over the web. It’s usually some scripting hacker trying to get access to my stuff.

The problem is that at the bottom of the email, there is this section: Continue reading You call this security? You’ve got to be kidding me

Bring back the noise!

So I think that ever since I branched off to my own site, my great friends of the LJ community have not been as responsive to any of my posts as they might have been in the past.

This is probably due to the fact that being on an external site engine, it’s a little more difficult to “point, click and comment” on a given post.

To increase the ease of using my site, I have now incorporated a plugin that allows the use of any OpenID user to comment with a little less hassle.

Once you’ve used your Continue reading Bring back the noise!