This vignette is intended as a primer for the not-beginner1 R user who has no (or minimal) experience with Docker.
One of the main ways to create AWS Lambda functions is with Docker images. Don’t worry if you don’t know what that is yet. What’s important to know is that for lambdr to work you need to have a valid Docker image.
The sections that follow will explain what a Docker image is and how to make one for use with lambdr.
An image is a bit like a simple, switched-off virtual computer. Or, as Docker puts it, “a standardized package that includes all of the files, binaries, libraries, and configurations to run a container.”
And what’s a container? Well, “A container is simply an isolated process with all of the files it needs to run.” That sounds kind of vague, and it is, so for the purposes of lambdr you can think of it as an instance of the virtual computer (image) that’s switched on and can run your R code. It’s independent of your own computing environment, but you can give it access to things like directories, so that there’s a live link between the file system in the container and your local code.
The image doesn’t come from nowhere. It’s created from a Dockerfile, which is “a text-based document” that “provides instructions to the image builder on the commands to run, files to copy, startup command, and more.”
It’s also very useful to know the following terms:
This is what the flow looks like, from left to right:
For your R process, the image needs to contain a Linux distribution, your code, R itself, the R packages your code needs, and any system dependencies for Linux that R and the R packages require.
This is pretty similar to how most of us work locally. We have an operating system - typically macOS, Windows, or Linux. We install R. We install R packages. And we install any system dependencies that we need along the way, e.g. imagemagick, or Postgres.
Below is an example of a Dockerfile that could be used with lambdr. The purpose is just to show a minimal example that explains the basic concepts. It is not as an example Dockerfile to be used in production.
If you are confident that you understand Dockerfiles, images, and containers, and simply want a production-ready example to use as a reference, please see the article Placing an R Lambda Runtime in a Container.
Here is a full Dockerfile, followed by item-by-item explanations:
FROM docker.io/rocker/r-ver:4.4
RUN Rscript -e "options(warn = 2); install.packages('pak')"
RUN Rscript -e "options( \
warn = 2, \
repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \
); \
pak::pak( \
c( \
'httr2', \
'lambdr' \
) \
)"
# Lambda setup
RUN mkdir /R
COPY R/ /R
RUN chmod 755 -R /R
ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]
Dockerfiles contain instructions. In this example
there are instructions like FROM
and RUN
.
The first instruction is:
To understand FROM
you need to know that Docker images
are built up of layers. Each layer is added to the ones that preceded
it. This even includes images someone else has built and put online.
That’s great, because they can form the base on which you continue to
build - which is also why they’re known as base
images.
Base images are hosted in repositories. In our example, the
repository is docker.io
. The image in question is provided
by rocker, who make reliable
Docker containers for R environments. The image has a
name, r-ver
, and a tag,
4.4
.
The instruction FROM
simply tells the builder to
download and use the base image as a starting point.
This particular base image has a version of Linux (Ubuntu ‘Jammy’), R, and the system dependencies required for using R. Information about the image was available on the rocker website (at time of writing).
The next instruction used is RUN
. Every time
RUN
is used it makes a new image layer and executes
whatever argument/s it has been given.
Here, it is used to install R packages - first pak, which is itself an R package
manager, and then httr2
and lambdr
by using
pak
:
RUN Rscript -e "options(warn = 2); install.packages('pak')"
RUN Rscript -e "options( \
warn = 2, \
repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \
); \
pak::pak( \
c( \
'httr2', \
'lambdr' \
) \
)"
This is probably not how you’re used to executing R code. It looks like this for a few reasons.
Rscript
, which is “A binary front-end to R, for
use in scripting applications”. It can be given R files or, like in this
case, in-line scripts to execute. Basically, it’s a non-interactive
method of executing R code\
. The \
escapes any newlinesA warning level and a ‘special’ CRAN repository are also supplied. These ensure that the Docker image will error out if there’s a problem installing packages, and that the packages are the latest amd64 binary versions for Ubuntu ‘Jammy’ as at the date supplied. For more detail about why these are set, see Placing an R Lambda Runtime in a Container.
For now, just know that RUN
added two layers where R
packages get installed.
In this section the RUN
instruction is used again,
alongside COPY
:
A directory called /R
is made in the image, then the
local folder of R code is copied into the image folder.
COPY
uses the same syntax as cp
,
i.e. source
destination
.
Then, chmod 755 -R
gets applied to the /R
folder. In short, this just makes sure the files in the folder can be
executed in the image.
The final instructions are ENTRYPOINT
and
CMD
. At this point, all the R code, packages, and
dependencies have been installed. All that’s left is to tell
Lambda where to find the runtime interface client and
handler
function - both of which are new concepts explained
below.
The runtime.R
file contains two things necessary for
Lambda to run:
handler()
lambdr
function
lambdr::start_lambda()
An example runtime.R
is given in the article Placing an R Lambda Runtime in a
Container. For this introduction all you need to know is that
ENTRYPOINT
needs to execute Rscript
on
R/runtime.R
so that lambdr::start_lambda()
gets called.
CMD
simply takes the name of the handler function. You
can call the handler whatever you like, but as there is only ever one
handler per Lambda, by convention we just call it
handler
. The CMD
gets passed as an argument to
lambdr::start_lambda()
.
You’re probably wondering “How do I use the Dockerfile!?”
We won’t get into that here.
Instead, the article Placing an R Lambda Runtime in a Container has sections about development versus deployment containers, and how to practically figure that stuff out for use with lambdr. We recommend reading the whole article.
Before moving on to the article, you may also want to look into the basics of Docker more generally. You can get a good feel for what using Docker looks like in practice by watching Docker Tutorial for Beginners by James Murphy. Or, if you want a hands-on tutorial there is the official Docker 101 Tutorial.
If you’re no longer a beginner, but aren’t sure if you’re an intermediate or advanced R programmer, perhaps you are a “not-beginner”! For more thoughts on this topic see Meghan Harris’s blog post How I Became A “Not-Beginner” in R↩︎