Deploying SageMaker Endpoints With Terraform


Infrastructure as Code With Terraform

Image from Unsplash by Krishna Pandey

Infrastructure as Code (IaC) is an important concept to optimize and take your resources and infrastructure to production. IaC is an age old DevOps/Software practice and has a number of key advantages: Resources are maintained centrally via code, which in turn optimizes the speed and collaboration required to take your architecture to production.

This software best practice like many other also applies to your Machine Learning tooling and infrastructure. For today’s article we’ll take a take a look at how we will utilize an IaC tool generally known as Terraform to deploy a pre-trained SKLearn model on a SageMaker Endpoint for inference. We are going to explore how we will create a reusable template which you could adjust as you will have to update your resources/hardware. With Terraform we will move from having standalone notebooks and individual Python files scattered in every single place to capturing all our essential resources in a single template file.

Another choice for Infrastructure as Code with SageMaker is CloudFormation. You’ll be able to reference this text, if that’s a preferred tool in your use-case. Note that Terraform is Cloud Provider agnostic, it spans across different cloud providers, whereas CloudFormation is specifically for AWS services.

NOTE: For those of you latest to AWS, ensure you make an account at the next link if you must follow along. Make sure that to even have the AWS CLI installed to work with the instance. This text may also assume basic knowledge of Terraform, take a take a look at this guide if you happen to need a starting guide and reference the next instructions for installation. The article also assumes an intermediate understanding of SageMaker Deployment, I might suggest following this text for understanding Deployment/Inference more in depth, we can be using the identical model in this text and mapping it over to Terraform.


As stated earlier we won’t really be specializing in the speculation of model training and constructing. We’re going to quickly train a sample SKLearn model on the built-in Boston Housing Dataset that the package provides.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics
import joblib

#Load data
boston = datasets.load_boston()
df = pd.DataFrame(, columns = boston.feature_names)
df['MEDV'] = boston.goal

#Split Model
X = df.drop(['MEDV'], axis = 1)
y = df['MEDV']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 42)

#Model Creation
lm = LinearRegression(),y_train)

with open('model.joblib', 'wb') as f:

with open('model.joblib', 'rb') as f:
predictor = joblib.load(f)

print("Testing following input: ")
sampInput = [[0.09178, 0.0, 4.05, 0.0, 0.51, 6.416, 84.1, 2.6463, 5.0, 296.0, 16.6, 395.5, 9.04]]

Here we quickly validate that the local model performs inference as expected. The script also emits the serialized model artifact that we are going to provide to SageMaker for deployment. Next we create a custom inference script, that essentially serves as an entry point script for coping with pre/post processing for SageMaker Endpoints.

import joblib
import os
import json

Deserialize fitted model
def model_fn(model_dir):
model = joblib.load(os.path.join(model_dir, "model.joblib"))
return model

request_body: The body of the request sent to the model.
request_content_type: (string) specifies the format/variable style of the request
def input_fn(request_body, request_content_type):
if request_content_type == 'application/json':
request_body = json.loads(request_body)
inpVar = request_body['Input']
return inpVar
raise ValueError("This model only supports application/json input")

input_data: returned array from input_fn above
model (sklearn model) returned model loaded from model_fn above
def predict_fn(input_data, model):
return model.predict(input_data)

prediction: the returned value from predict_fn above
content_type: the content type the endpoint expects to be returned. Ex: JSON, string

def output_fn(prediction, content_type):
res = int(prediction[0])
respJSON = {'Output': res}
return respJSON

Next we wrap up each the script and the model artifact right into a tarball format that SageMaker is compliant with. We then upload this model tarball into an S3 Bucket, as that’s the principal storage option for all artifacts that SageMaker works with.

import boto3
import json
import os
import joblib
import pickle
import tarfile
import sagemaker
from sagemaker.estimator import Estimator
import time
from time import gmtime, strftime
import subprocess

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")
boto_session = boto3.session.Session()
s3 = boto_session.resource('s3')
region = boto_session.region_name
sagemaker_session = sagemaker.Session()
role = "Replace along with your SageMaker IAM Role"

#Construct tar file with model data + inference code
bashCommand = "tar -cvpzf model.tar.gz model.joblib"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

#Bucket for model artifacts
default_bucket = sagemaker_session.default_bucket()

#Upload tar.gz to bucket
model_artifacts = f"s3://{default_bucket}/model.tar.gz"
response = s3.meta.client.upload_file('model.tar.gz', default_bucket, 'model.tar.gz')

Terraform Variables

Inside our template file (.tf) we first need to define something generally known as a Terraform Variable. With Input Variables specifically you may pass in values just like arguments for functions/methods you define. Any values that you just don’t need to hardcode, but additionally give default values to you may specify within the format of a variable. The variables we’ll be defining for a Real-Time SageMaker Endpoint are listed below.

  • SageMaker IAM Role ARN: That is the Role related to the SageMaker service, attach all policies essential for actions you’ll take with the service. Note, you may as well define and reference a Role inside Terraform itself.
  • Container: The Deep Learning Container from AWS or your individual custom container you will have built to host your model.
  • Model Data: The pre-trained model artifacts that we uploaded to S3, this may also be the trained artifacts emitted from a SageMaker Training Job.
  • Instance Type: The hardware behind your real-time endpoint. You can even make the variety of instances right into a variable if you happen to would really like.

For every variable you may define: the sort, the default value, and an outline.

variable "sm-iam-role" {
type = string
default = "Add your SageMaker IAM Role ARN here"
description = "The IAM Role for SageMaker Endpoint Deployment"

variable "container-image" {
type = string
default = ""
description = "The container you might be utilizing in your SageMaker Model"

variable "model-data" {
type = string
default = "s3://sagemaker-us-east-1-474422712127/model.tar.gz"
description = "The pre-trained model data/artifacts, replace this along with your training job."

variable "instance-type" {
type = string
default = "ml.m5.xlarge"
description = "The instance behind the SageMaker Real-Time Endpoint"

While we don’t cover it fully in depth in this text, you may as well define variables for various hosting options inside SageMaker. For instance, inside Serverless Inference you may define Memory Size and Concurrency as two variables that you must set.

variable "memory-size" {
type = number
default = 4096
description = "Memory size behind your Serverless Endpoint"

variable "concurrency" {
type = number
default = 2
description = "Concurrent requests for Serverless Endpoint"

Terraform Resources & Deployment

Probably the most essential Terraform constructing block is a Resource. Inside a Resource Block you essentially define an infrastructure object. For our use-case we specifically have three SageMaker constructing blocks: SageMaker Model, SageMaker Endpoint Configuration, and a SageMaker Endpoint. Each of those are linked in a sequence and eventually help us create our desired endpoint.

We will follow the Terraform Documentation for a SageMaker Model to start. First we define the resource itself which has two components: the terraform name for the resource and the next string is the name you define if you must reference it later within the template. One other key part we notice here is how we will reference a variable value, using the Terraform key word var.

# SageMaker Model Object
resource "aws_sagemaker_model" "sagemaker_model" {
name = "sagemaker-model-sklearn"
execution_role_arn =

Next for our SageMaker Model we define our container and model data that we defined earlier and reference those specific variables.

primary_container {
image = var.container-image
mode = "SingleModel"
model_data_url = var.model-data
environment = {

Optionally inside SageMaker you may as well provide a tag that you just define for the precise object.

tags = {
Name = "sagemaker-model-terraform"

We apply an identical format for our Endpoint Configuration, here we essentially define our hardware.

# Create SageMaker endpoint configuration
resource "aws_sagemaker_endpoint_configuration" "sagemaker_endpoint_configuration" {
name = "sagemaker-endpoint-configuration-sklearn"

production_variants {
initial_instance_count = 1
instance_type = var.instance-type
model_name =
variant_name = "AllTraffic"

tags = {
Name = "sagemaker-endpoint-configuration-terraform"

We then reference this object in our endpoint creation.

# Create SageMaker Real-Time Endpoint
resource "aws_sagemaker_endpoint" "sagemaker_endpoint" {
name = "sagemaker-endpoint-sklearn"
endpoint_config_name =

tags = {
Name = "sagemaker-endpoint-terraform"


Before we will deploy the template to provision our resources, ensure you will have the AWS CLI configured with the next command.

aws configure

Here we will then initialize our terraform project, with the next command.

terraform init

For deployment, we will then run one other Terraform CLI command.

terraform apply
Resource Creation (Screenshot by Creator)

While the endpoint is creating you may as well validate this with the SageMaker Console.

Endpoint Creation SM Console (Screenshot by Creator)

Additional Resources & Conclusion

The whole code for the instance will be present in the repository above. I hope this text was a very good introduction to Terraform on the whole in addition to usage with SageMaker Inference. Infrastructure as Code is an important practice that can’t be ignored on the planet of MLOps when scaling to production.


What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x