Reduce AWS Lambda Latencies with Keep-Alive in Python

It starts, as many stories do, with a question. On September 10th, AWS Serverless Hero Luc van Donkersgoed shared his observations on the relationship of reduced latency with increased request rate when using AWS Lambda. This is always an interesting conversation, and sure enough other AWS Heroes like myself are curious about some of the outlier behaviors, and what exactly is going into each request. AWS Data Hero Alex DeBrie, AWS Container Hero Vlad Ionescu both ask excellent questions about the setup and the behaviors, leading Luc to share what he’s seeing with regards to DNS lookups that don’t make sense to him.

After asking a couple of more questions of my own, I rolled up my sleeves and dug into the what, how, and why.

getting ready to read things and hit them with sticks

I dive in to all parts of the stack in use to try and understand why Luc’s code is seeing DNS lookups.
For example, if your function needs to call AWS S3 or a Twilio API, we usually provide the domain name, and have the code or library perform a request to a Domain Name System (DNS) server to return the current IP address, and then communicate using the IP address. This is a network call and can be expensive (in milliseconds) if it’s performed more frequently than the DNS response’s Time To Live (TTL) – kind of like an expiration date. The DNS lookup adds some more latency to your overall call, which is why many systems will cache DNS responses until the TTL is expired, and then make a new call. If you perform DNS lookups when not needed, that’s adding latency unnecessarily. Read the tweet thread for more!

I arrive at two possible solutions:

  1. If the Python code calls more than 10 AWS service endpoints, it will trigger a DNS lookup, as urllib3‘s PoolManager will only maintain 10 connections (set by botocore defaults) and will need to recycle if exceeded.
  2. Since we’re unlikely to be hitting the limit of 10, something else is at play.
    I found that the default behavior of boto3 is to not use Keep Alive, thus explaining why the occasional connection is reset, triggering a DNS lookup. (Read the tweet thread for the full discovery.)

Using Keep-Alive is nothing new, and was covered quite well by AWS Serverless Hero Yan Cui back in 2019 for Node. It’s even in the official AWS Documentation, citing Yan’s article for the proposed update. Thanks Yan!

There’s precious little literature on using Keep Alive for Python Lambdas that I could find, leading to issues like Luc’s and reports like this one, so I decided to dig a little further. Knowing now that the default for Keep Alive is off by default for users of the popular boto3 package to interact with AWS services, I wanted to explore what that looks like in practice.

I decided to pattern an app after Yan’s example – a function at receives an event body, and persists it to DynamoDB. All in all, not a too complex operation – we perform a single DNS lookup for the DynamoDB service endpoint, and then use the response IP address to connect over HTTP to put an object into the DynamoDB table.

After re-writing the same function in Python, I was able to test the same kind of behavior that Yan did, running a call to the function once per second, isolating any concurrency concerns, replicating Luc’s test. This should have the benefit of reusing the same Lambda context (no cold starts) and seeing that the latencies range from 7 to 20 milliseconds for the same operation:

filtered log view showing only the latency for put_item calls to DynamoDB for 30 seconds

So far, so good – pretty much the same. The overall values are lower than Yan’s original experiment, which I attribute to the entire Lambda ecosystem improving, but we can see there’s variance and we often enter double-digit latencies, when we know that the DynamoDB operation is likely to only take 6-7 milliseconds.

left side shows spiky responses; right side shows most responses are fast, with some slower outliers

As Yan showed in his approach adapted from Matt Levine’s talk snippets, he was able to reconstruct the AWS Config by rebuilding the lowest-level HTTP agent that the library relies on to make the calls, and thereby set the behavior for Keep Alive. This has since been obsoleted by the AWS Node.JS SDK adding an environment variable to enable the keep alive behavior, which is awesome! But what about Python? 🐍

In the recent release of botocore 1.27.84 we can modify the AWS Config passed into the client constructor:

# before:
import boto3
client = boto3.client("dynamodb")

# after:
import boto3
from botocore.config import Config
client = boto3.client("dynamodb", config=Config(tcp_keepalive=True))

With the new configuration in place, if you try this on AWS python3.9 execution runtime, you’ll get this error:
[ERROR] TypeError: Got unexpected keyword argument 'tcp_keepalive'

While the AWS Python runtime includes versions of boto3 and botocore, they do not yet support the new tcp_keepalive parameter – the runtime currently ships:
– boto3 1.20.32
– botocore 1.23.32

So we have to solve another way.

The documentation tells us that we can configure this via a config file in ~/.aws/config, added in version 1.9.17 back in October 2018 – presumably when all the Keep Alive conversations were fresh in folks’ minds.

However, since the Lambda runtime environment disallows writing to that path, we can’t write the config file easily. We might be able to create a custom Docker runtime and place a file in the path, but that’s a bit harder, and we lose some of the benefits of using the AWS prebuilt runtime like startup latency, which when we’re exploring a latency-oriented article, seems like the wrong choice 😁.

Using serverless framework CLI with the serverless-python-requirements (what I’m currently using), or AWS SAM, you can add the updated version of boto3 and botocore, and deploying the updated application allows us to leverage the new setting in a Lambda environment. You may already be using one of these approaches for a more evolved application.
Hopefully 🤞 the Lambda Runtime will be updated to include these versions in the near future, so we don’t have to package these dependencies to get this specific feature.

With the updated packages, we can pass the custom Config with tcp_keepalive enabled (as shown above), and observe more constant performance for the same style of test:

left: much smoother!! right: narrower distribution of values, max 8.50 ms

There’s an open request for the config value to be available via environment variable – check it out and give it a 👍 to add your desire and subscribe via GitHub notifications.

Enjoy lower, more predictable latencies with Keep Alive!

Check out the example code here:

Postscript: If you’re interested in pinpointing calls for performance, I recommend checking out Datadog’s APM and associated ddtrace module to see the specifics of every call to AWS endpoints and associated latencies, as well as other parts of your application stack. There’s a slew of other vendors that can help surface these metrics.

Extending ECS Auto-scaling for under $2/month with Lambda

The Problem

Amazon Web Services (AWS) is pretty cool. You ought to know that by now. if you don’t, take a few hours and check out some tutorials and play around.

One of the many services AWS provides is the EC2 Container Service (ECS), where the scheduling and lifecycle management of running Docker containers is handled by the ECS control plane (probably magic cooked up in Seattle over coffee or in Dublin over a pint or seven).

You can read all about its launch here.

One missing feature from the ECS offering in comparison to other container schedulers was the concept of scheduling a service to be run on each host in a cluster, such as a logging or monitoring agent.
This feature allows clusters to grow or shrink and still have the correct services running on each node.

A published workaround was to have each node individually run an instance of the defined task on startup, which works pretty well.

The downside here is is that if a task definition changes, ECS has no way of triggering an update to the running tasks – normal services will stop then start the task with a new definition, and use your logic to maintain some degree of uptime.
To achieve the update, one must terminate/replace the entire ECS Container Instance (the EC2 host) and if you’re using AutoScalingGroups, get a fresh node with the updated task.

Other Solutions

  • Docker Swarm calls this a global service, and will run one instance of the service on every node.
  • Mesos’ Marathon doesn’t support this yet either, and is in deep discussion on GitHub on how to implement this in their constraints syntax.
  • Kubernetes has a DaemonSet to run a pod on each node.
  • The recently-released ECS-focused Blox provides a daemon-scheduler to accomplish this, but brings along extra components to accomplish the scheduling.

Back to ECS

So imagine my excitement when the ECS team announced the release of their new Task Placement Strategies last week, offering a “One Task Per Host” strategy as part of the Service declaration.
This indeed is awesome and works as advertised, with no extra components, installs, schedulers, etc.

However! Currently each Service requires a “Desired Count” parameter of how many instances of this service you want to run in the cluster.

Given a cluster with 5 ECS Container Instance hosts, setting the Desired Count to 5 ensures that one runs on each host, provided there are resources available (cpu, ram, available port).

If the cluster grows to 6 (autoscaling, manually adding, etc), there’s nothing in the Service definition that will increase the desired count to 6, so this solution is actually worse off than our previous mode of using user-data to run the task at startup.

One approach is to arbitrarily raise the Desired count to a very high number, such as 100 for this cluster, with the consideration that we are unlikely to grow the cluster to this size without realizing it.
The scheduler will periodically examine the cluster for placement, and handle any hosts missing the service.

The problem with this is that it’s not deterministic, and CloudWatch metrics will report these unplaced tasks as Pending, and I have alarms to notify me if tasks aren’t placed in clusters, as this can point to a resource allocation mismatch.

Enter The Players

To accomplish an automated service desired count, we must use some elements to “glue” a few of the systems together with our custom logic.

Here’s a sequence diagram of the conceptual flow between the components.

UML Sequence Flow

Every time there is a change in an ECS Cluster, CloudWatch Events will receive a payload.
Based on a rule we craft to select events classified as “Container Instance State Change”, CW Events will emit an event to the target of your choice, in our case, Lambda.

We could feasibly use a cron-like schedule to fire this every N minutes to inspect, evaluate, and remediate a semi-static set of services/cluster, but having a system that is reactive to change feels preferable to poll/test/repair.

A simple rule that captures all Container Instance changes:

  "source": [
  "detail-type": [
    "ECS Container Instance State Change"

You can restrict this to specific clusters by adding the cluster’s ARN to the keys like so:

  "detail": {
    "clusterArn": [

If being throttled or cost is a concern here, you may wish to filter to a set of known clusters, but this reduces the reactiveness of the logic to new clusters being brought online.

The Actual Logic

The Lambda function receives the event, performs some basic validation checks to ensure it has enough details to proceed, and then makes a single API call to the ECS endpoint to find our specified service in the cluster that fired the change event.

If no such service is found, we terminate now, and move on.

If the cluster does indeed have this service defined, then we perform another API call to describe the count of registered container instances, and compare that with the value we already have from the service definition call.

If there’s a mismatch, we perform a final third API call to adjust the service definition’s desired task count.

All in all, a maximum total of 3 possible API calls, usually in under 300ms.

In my environment, I want this task to apply to every cluster in my account, as we later on inspect the cluster to see if it has a service definition applied to it, to act upon.
In my ballpark figures with a set of 10 active clusters, the cost for running this logic should be under $2/month – yes, two dollars a month to ensure your cluster has the correct number of tasks for a given service.
Do you own estimation with the Lambda Pricing Calculator.


The code can be found on GitHub, and was developed with test-everything philosophy, where I spent a large amount of time learning how to actually write the code and tests elegantly.
Writing out all of the tests and sequences allowed me to find multiple points of refactoring and increased efficiency from my first implementation, leading to a much cleaner solution.
Taking on a project like this is a great way to increase one’s own technical prowess, leading to the ability to reason about other problems.

While I strongly believe that this feature should be part of the ECS platform and not require any client-side intervention, the ability to take the current offerings and extend them via mechanisms such as Events, Lambda and API calls further demonstrates the flexibility and extensibility of the AWS ecosystem.
The feature launched just over a week ago, and I’ve been able to put together an acceptable solution on my own, using the documentation, tooling, and infrastructure while minimizing costs and making my system more reactive to change.

I look forward to what else the ECS, Lambda and CloudWatch Events team cook up in the future!