Placing an R Lambda Runtime in a Container • lambdr

Introduction

This article shows how to bundle up your R code into a Docker container and deploy it as an AWS Lambda function using lambdr. We provide a self-contained working example that can be used as a test/template for your own project.

It is not an exhaustive or authoritative resource on all things AWS Lambda and Docker. Rather, it is a guide intended to get you up and running.

Pre-requisites

You will need an AWS account. Remember, you are responsible for the account and any billing costs that are incurred.

You need to be comfortable with R and have at least a vague understanding of Dockerfiles, Docker images, and containers. If you are not familiar with Docker, see the article A Primer on Docker for lambdr.

For AWS, you’ll need a vague understanding of one or more of the AWS Console, AWS CLI, or AWS CDK, depending on which deployment examples you intend to follow later in this article.

Minimal Deployment Ready Example

In this section you will see everything that goes into making an absolutely minimal deployment-ready R project with lambdr.

Ideally, Lambda functions should be pretty simple and do their business briskly. In reality, they may take several minutes to run and interact with various other AWS resources such as databases and S3 buckets.

The example given here is basic enough to understand, and realistic enough to demonstrate some good project structure habits when making a Lambda. It purposefully doesn’t require access to other AWS resources, like S3 buckets or databases: Managing and granting permissions are far out of the scope of lambdr.

Example project structure

The example project is called flags. When given a full or partial country name it queries the REST Countries API and returns information about the country’s flag.

.
├── Dockerfile
└── R
    ├── functions.R
    └── runtime.R

The minimal structure for a project only needs to contain two elements

A directory R/ containing the R scripts
Dockerfile

The Dockerfile packages up the R scripts in a Linux distribution with R, any required R packages, and system dependencies for both R and the R packages.

One of the R scripts should always be called runtime.R.

Project file contents

First let’s look at the R code.

functions.R

The functions in this file get sourced and used in runtime.R.

Don’t worry if you aren’t familiar with calling RESTful APIs. In summary, create_request() makes a request object to ask https://restcountries.com about a country’s flag. Then perform_request() sends the request to the website, checking the response with the helper function unsuccessful().

library(httr2)

create_request <- function(country) {
  stopifnot("'country' must be a string containing a full or partial country name, e.g. 'ghana'" = nzchar(country))
  country <- utils::URLencode(country)

  base_url <- "https://restcountries.com/v3.1/name/"

  httr2::request(base_url) |>
    httr2::req_user_agent(
      "lambdr example (https://github.com/mdneuzerling/lambdr/)"
    ) |>
    httr2::req_url_path_append(country) |>
    httr2::req_url_query(fields = "flags")
}

unsuccessful <- function(resp) {
  body <- httr2::resp_body_json(resp)
  msg <- sprintf("\nHTTP %s: %s.\n", body$status, body$message)
  stop(
    msg,
    "Check supplied country name is valid and/or the server status."
  )
}

perform_request <- function(req) {
  resp <- req |>
    httr2::req_error(is_error = \(x) FALSE) |>
    httr2::req_perform()

  if (resp[["status_code"]] != 200L) {
    unsuccessful(resp)
  }

  return(httr2::resp_body_json(resp))
}

runtime.R

This file orchestrates the rest of the R code and starts up the lambdr runtime interface client.

runtime.R is short and simple. There are three main sections

Set up: loading packages, sourcing functions, and indicating a logging threshold
The handler() function definition
Start lambdr

Your Lambda function will likely¹ take a JSON payload with some inputs. lambdr will convert that JSON into an R list and pass the items to handler().

For example, a payload for this flags Lambda could be {"country": "ghana"}. lambdr would convert it to list(country = "ghana"), then pass it to handler() for us.

Note that if lambdr::start_lambda() is called interactively it will throw an error. This is intentional. lambdr relies on the presence of environment variables that are only available when in the deployed Lambda execution environment. However, you may want to test your code in interactive sessions during development, so we simply wrap the function in if (!interactive()).

library(lambdr)
library(logger)
source(file.path("R", "functions.R"))

logger::log_threshold(logger::DEBUG)

handler <- function(country) {
  logger::log_info("Event received: ", country)

  req <- create_request(country)
  resp <- perform_request(req)

  return(resp)
}

if (!interactive()) {
  lambdr::start_lambda()
}

Dockerfile

The Dockerfile builds on the one in A Primer on Docker for lambdr. That article explains each item in the Dockerfile step-by-step. Here, we emphasise what makes the Dockerfile below production-ready.

Essentially, it’s all about version control.

Once the project is ready for deployment you should pin the base image to a specific version. To do this you provide the image’s digest, which is a hash value. Here we use @sha256:429... which is for the amd64 version of rocker/r-ver:4.4 at time of writing. Using the tag is not sufficient because the image it refers to can be subject to change - but the image specified by the digest will not.

To find a digest for an image you’ve already pulled you can use docker images --digests. Or you can get it from the repository where you found the image.

The other version control aspect here is using the Posit Public Package Manager to get amd64 Ubuntu binaries from a snapshot of CRAN. This is simple, but not necessary. For example, you could use renv instead. But the point is, version the R packages - somehow!

FROM docker.io/rocker/r-ver:4.4@sha256:429c1a585ab3cd6b120fe870fc9ce4dc83f21793bf04d7fa2657346fffde28d1

# options(warn=2) will make the build error out if package doesn't install
RUN Rscript -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree
RUN Rscript -e "options( \ 
    warn = 2, \
    repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
    ); \ 
    pak::pak( \ 
    c( \ 
    'httr2', \
    'lambdr' \
    ) \
    )"

# Lambda setup
RUN mkdir /R
COPY R/ /R
RUN chmod 755 -R /R

ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]

Completed project

That’s it! These three files are all you need

.
├── Dockerfile
└── R
    ├── functions.R
    └── runtime.R

The following sections expand on considerations such as local testing and how to deploy the project into AWS.

Choosing base images

For R you can use any base image you want. So long as you have all the system dependencies required for R and the R packages, you only need to have lambdr installed and used as in the minimal example given above.

Do use

We recommend Rocker’s r-ver images.

They offer a tested, versioned R stack. Newer versions can be used with the Posit Public Package Manager to install Ubuntu binaries of packages as at a certain date on CRAN. They also allow use of pak, which makes life very convenient in terms of finding and installing R package system dependencies.

Do not use

The AWS Lambda ‘provided’ images.

They are minimal OS images. That means the base image is fairly small, which is positive. However, you have to install R and any dependencies yourself.

More importantly, at time of writing the current OS in the newer images is Amazon Linux 2, which is not based on a singular Linux distro, but rather a blend of multiple. That makes installing R package binaries and system dependencies more challenging, and slow.

The previous version of the OS (Amazon Linux 2023) is drastically different to Amazon Linux 2 and will no longer receive support from AWS by the end of summer 2025.

Dev vs deployment Dockerfiles

Up until this point we have only shown and discussed a Dockerfile for deployment. While you are doing development work, it is a good idea to have a dev Dockerfile.

The dev Dockerfile should mimic the deployment one as much as possible, but will also have any extra features you need to make your life easier as a developer.

Dockerfile.dev

The example given below can be added to the flags project

.
├── Dockerfile
├── Dockerfile.dev
└── R
    ├── functions.R
    └── runtime.R

FROM ghcr.io/rocker-org/devcontainer/r-ver:4.4@sha256:e99cfe63efd5d79f44146d8be8206019fd7a7230116aa6488097ee660d6aa5dc

# Install the Lambda Runtime Interface Emulator, which can be used for locally
# invoking the function.
# See https://github.com/aws/aws-lambda-runtime-interface-emulator for details
RUN apt-get update && apt-get -y install --no-install-recommends curl 
RUN curl -Lo /usr/local/bin/aws-lambda-rie https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie && \
    chmod +x /usr/local/bin/aws-lambda-rie

# options(warn=2) will make the build error out if package doesn't install
RUN Rscript -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree
RUN Rscript -e "options( \ 
    warn = 2, \
    repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
    ); \ 
    pak::pak( \ 
    c( \ 
    'httr2', \
    'lambdr' \
    ) \
    )"

# Lambda setup
RUN mkdir /R

# Needs to be set to use aws-lambda-rie. It is the path up to runtime.R
ENV LAMBDA_TASK_ROOT="/R"

# Optional for local testing
# 900s, i.e. 15 min, the max time a lambda can run for.
ENV AWS_LAMBDA_FUNCTION_TIMEOUT=900

The differences as compared to the deployment Dockerfile are

Uses a Dev Container base image
- Adds some tooling
Installs the Lambda Runtime Interface Emulator
- The RIE opens up the possibility of locally testing the Lambda function
Doesn’t copy R files into the image
- Instead, create a live link between system and container file systems at run time

The Dev Container base image

FROM ghcr.io/rocker-org/devcontainer/r-ver:4.4@sha256:e99cfe63efd5d79f44146d8be8206019fd7a7230116aa6488097ee660d6aa5dc

This time we are using a devcontainer base image, which “allows you to use a container as a full-featured development environment”. This is made by Rocker and it’s the same as the r-ver:4.4 image but with some extra tooling.

The most valuable of these is having radian and its dependencies installed, plus some of the other usual setup required to use VS Code as an IDE for R. Using the actual devcontainer VS Code extension is an exercise left for the reader, and with a word of warning: One of the authors of this vignette has seen some M2 MacBook Pros with 16GB of RAM struggling to run the devcontainer extension.

If you prefer RStudio as your IDE as an alternative you could use a Rocker RStudio Server image.

Build and run dev container

While you’re developing it’s a good idea to work out of the dev container.

One option for doing so is to have a “build script” that builds the image and runs a container.

.
├── build.sh
├── Dockerfile
├── Dockerfile.dev
└── R
    ├── functions.R
    └── runtime.R

#!/bin/sh
docker stop flags && docker container rm flags

docker build -t flags:latest \
             -f Dockerfile.dev .

docker run \
    -p 9000:8080 \
    -it \
    --rm \
    -v ~/.aws/:/root/.aws \
    -v ./R:/R \
    --name flags \
    flags:latest \
    bash

You might run this script by doing e.g. bash build.sh

If a container already exists it will be stopped and removed. Note that if there isn’t an image you’ll see an error Error response from daemon: No such container: flags – this is expected.

The image will only build if Dockerfile.dev has been altered or is being built for the first time.

The options being given to docker run are:

-p Publishes port 9000 to host’s 8080. For local testing using the Lambda RIE (see below)
-it Start the container with an interactive terminal
--rm Remove the container when it is exited
-v Mount. Creates a live link between the host and container file system
- ~/.aws makes AWS credentials available in the container. Can be useful if you need to use {paws}. If you don’t need AWS creds in the container then you shouldn’t mount this volume
- ./R contains the lambda code

Local testing with AWS RIE

In the dev Dockerfile we installed the AWS Runtime Interface Emulator (RIE).

The emulator gets as close to the environment of a Lambda as is possible without actually pushing up to AWS and invoking.

Two small shell scripts in local-testing below, plus the build script from the previous section (required for port forwarding) are all we need to add:

.
├── local-testing
│   ├── event.sh
│   └── start-rie.sh
├── build.sh
├── Dockerfile
├── Dockerfile.dev
└── R
    ├── functions.R
    └── runtime.R

start-rie.sh starts the emulator with our handler:

#!/bin/bash
exec /usr/local/bin/aws-lambda-rie Rscript /R/runtime.R handler

Run bash local-testing/start-rie.sh from inside the running container. This will start the emulator, which runs an HTTP endpoint, similar to what happens in the deployed Lambda. The endpoint will be waiting for a payload to pass to the R code and this makes the container’s terminal busy. If you need to stop the process just press Ctrl + C.

Sending a payload to the emulator is the equivalent of invoking the Lambda. This is done with event.sh:

#!/bin/bash
port="9000"
endpoint="http://localhost:$port/2015-03-31/functions/function/invocations"

curl -XPOST $endpoint -d '{"country":"'"$1"'"}'

Simply run bash local-testing/event.sh someCountry replacing someCountry with a partial or full country name. The result will appear in your terminal. If you go look at the container terminal all the logs (equivalent to what would appear in CloudWatch) from the execution will be present. The RIE will run until you stop it with Ctrl + C.

If your own Lambda doesn’t actually take any parameters then your payload should be an empty json:

#!/bin/bash
port="9000"
endpoint="http://localhost:$port/2015-03-31/functions/function/invocations"

curl -XPOST $endpoint -d '{}'

And finally, one more option if using the devcontainer VS Code extension. Because you can spawn multiple terminals from within the container, you can start the RIE and send it a payload from another terminal inside the container. In this case the port will be 8080, which is easy to add logic for in event.sh because of the devcontainer envvar $DEVCONTAINER:

if [ "$DEVCONTAINER" = "TRUE" ]
then
    port="8080"
else
    port="9000"
fi

Deployment

At this point, you will have at minimum a project with Dockerfile, some R code, and a runtime.R that starts lambdr:

.
├── Dockerfile
└── R
    ├── functions.R
    └── runtime.R

Deployment is the act of turning these files into an AWS Lambda function that can be invoked. Here we provide some rough instructions for two common ways of deploying: via the AWS Console (the website), and the AWS Cloud Development Kit (CDK).

For both options you will need to have the AWS CLI installed (instructions) as a prerequisite. The CLI is how you interact with AWS from the terminal.

AWS Console

The following instructions use the example project given earlier, flags.

The steps are:

Build the image
Create a repository in the AWS Elastic Container Registry (ECR)
Push the image to ECR
Make the Lambda function from the image in ECR

First, in a terminal cd to the project and build the image:

docker build -t flags:latest .

Then, create a repository in ECR either by using the CLI or the Console. If you do it in the Console you pretty much just go to the ECR service, click Create, and call it flags.

Or you can use the CLI:

aws ecr create-repository --repository-name flags --image-scanning-configuration scanOnPush=true

Make a note of the URI, which is the resource identifier of the created repository.

The image can now be pushed to the repository. This part has to be done via the CLI. You can get all the commands ready-made for you via the Console by clicking on the repo name in ECR, then View push commands. Or, you can replace the username 123456789123 and the region region-name-1 in the commands below to do the same thing.

Note: You don’t need a Docker account for the docker login command.

docker tag flags:latest 123456789123.dkr.ecr.region-name-1.amazonaws.com/flags
aws ecr get-login-password | docker login --username AWS --password-stdin 123456789123.dkr.ecr.region-name-1.amazonaws.com/flags
docker push 123456789123.dkr.ecr.region-name-1.amazonaws.com/flags:latest

Now that the image is in ECR it can be used to make the Lambda function. You could make it from the command line, but this requires an IAM Role to be configured and ready for the function. That is beyond the scope of lambdr. If you have a Role and prefer to use the CLI, see the examples in aws lambda create-function help

In the Console go to the Lambda service. Click Create a function. Choose the container image option, give it a name (flags is fine) and choose the image from the ECR repository. Everything else can be left default.

It will take a minute for the function to be made.

Once it is, you can test it by scrolling down and clicking on the Test tab. Edit the Event JSON to be {"country": "namibia"}, then click the orange Test button. It should be successful. To see the response click Details, beneath the big green tick.

However, there’s a good chance of a timeout because the default value for a Lambda is only 3 seconds. To increase it, click on the Configuration tab, then Edit, and bump it up. 10 seconds is fine in this case.

Alternatively the Lambda can be invoked from the CLI:

aws lambda invoke --function-name flags \
  --invocation-type RequestResponse --payload '{"country": "namibia"}' \
  /tmp/response.json --cli-binary-format raw-in-base64-out

See the response with:

cat /tmp/response.json

[{"flags":{"png":"https://flagcdn.com/w320/na.png","svg":"https://flagcdn.com/na.svg","alt":"The flag of Namibia features a white-edged red diagonal band that extends from the lower hoist-side corner to the upper fly-side corner of the field. Above and beneath this band are a blue and green triangle respectively. A gold sun with twelve triangular rays is situated on the hoist side of the upper triangle."}}]

Cloud Development Kit (CDK)

The CDK allows you to programatically create application stacks and all of the associated resources they need, like Lambdas, Step Functions, and so on. It’s an alternative to clicking around in the AWS Console and means that your stack can (theoretically) be rebuilt at any time with just a few commands.

First install the CDK CLI.

Once the CDK CLI is installed:

Make a directory somewhere called LambdrExample, then cd into the directory and run the following in the terminal:

cdk init app --language typescript

You will now have a bunch of files and folders containing boilerplate code and libraries. We’re only interested in the bin and lib directories. You also need to add a new directory called lambda, and to that, the flags project, like below:

.
├── bin
│  └── lambdr_example.ts
├── lambda
│  └── flags
│     ├── Dockerfile
│     └── R
│        ├── functions.R
│        └── runtime.R
└── lib
   └── lambdr_example-stack.ts

Replace the contents of bin/lambdr_example.ts with this:

#!/usr/bin/env node
import "source-map-support/register";
import * as cdk from "aws-cdk-lib";
import { LambdrExampleStack } from "../lib/lambdr_example-stack.ts";

const app = new cdk.App();
new LambdrExampleStack(app, "LambdrExampleStack", {
  // Your account number and region go in the env below
  env: { account: "111122223333", region: "my-region-1" },
});

Replace the contents of lib/lambdr_example-stack.ts with this:

import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import * as lambda from "aws-cdk-lib/aws-lambda";

export class LambdrExampleStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    this.createFlagsLambda();
  }

  createFlagsLambda(): lambda.IFunction {
    const flagsLambda = new lambda.DockerImageFunction(this, "flags", {
      functionName: "flags",
      code: lambda.DockerImageCode.fromImageAsset("lambda/flags"),
      timeout: cdk.Duration.seconds(15),
    });

    return flagsLambda;
  }
}

You then need to run

cdk bootstrap

Followed by

cdk deploy

Some building will happen and then you will be prompted as to whether you wish to deploy a couple of changes. If you feel comfortable, enter y and hit return.

Note: If you get a failure because no bucket and/or ECR repository exist, this is probably because you have previously bootstrapped your account and now have “stack drift”. To resolve this, delete the CDKToolkit stack from CloudFormation and re-bootstrap. For more information, see SO here.

If all has gone well you should see a green tick and sparkles with deployment time.

Test by invoking from the CLI:

aws lambda invoke --function-name flags \
  --invocation-type RequestResponse --payload '{"country": "namibia"}' \
  /tmp/response.json --cli-binary-format raw-in-base64-out

See the response with:

cat /tmp/response.json

[{"flags":{"png":"https://flagcdn.com/w320/na.png","svg":"https://flagcdn.com/na.svg","alt":"The flag of Namibia features a white-edged red diagonal band that extends from the lower hoist-side corner to the upper fly-side corner of the field. Above and beneath this band are a blue and green triangle respectively. A gold sun with twelve triangular rays is situated on the hoist side of the upper triangle."}}]

If you make changes to the Dockerfile or R code of the Lambda, simply re-deploy with another cdk deploy.

Delete the CDK stack using Cloudformation via the AWS Console, or via the terminal with cdk destroy. You should also delete the ECR repository otherwise it will sit in your account and you will be charged for usage (a very small amount, but still).

Tidying up resources

To clean up the resources made using this guide, use the AWS Console to find and delete items in

CloudFormation (the stack)
ECR
Lambda

Though typically Lambdas will take an input, you can make them to simply be invoked with no arguments, e.g. a Lambda that scrapes the same website every day and runs on a schedule.↩︎