Nomad and Consul
Nomad
- workload orchestrator (containers, java, vm, windows, binaries)
- single binary (linux, arm, osx, windows)
- unified workflow for deployments
- zero downtime (blue-green, canary, rolling-release*)
- batch
- service / system
- easly integrates with Consul and Vault
- plugin system
- Container Storage Interface (CSI)
- Container Network Interface (CNI)
- use HCL to define workloads
Nomad versions
- current stable 0.12.5
- installation
asdf plugin-add nomad
asdf install nomad 0.12.5
asdf local nomad 0.12.5
CLI
nomad run # run/update a job definition
nomad stop # stop a job
nomad status # check a resource status
nomad agent # run nomad agent
nomad system -help # see a help for command
nomad -help # see global help
Architecture
- Job a declared workload for Nomad; defines a desired state
- Task Group is a set of tasks that must run together
- Driver is a tool to run a workload
- Task is a smallest unit of work in Nomad
- Client is a machine where Nomad can run tasks
- Allocation is kind a mapping between task group and job on Client
- Evaluation is a mechanism by which Nomad make scheduling decisions
- Server are bain of the cluster
- Regions and Datacenters are places where Nomad Clients and Servers are deployed
Architecture Overview
Single Region
Multiple Regions
Scheduling in Nomad
Job Specification
- written in HCL
- defines job, groups and tasks
- all possible options
Options
affinity
- soft placement preferencesartifact
- download an archivecheck_restart
- define rules when to restart an unhealthy taskconstraint
- hard placement preferencesenv
- set ENV variablesephemeral_disk
- disk setupgroup
- a group for set of taskslifecycle
- define task dependencieslogs
- set logs rotation policiesmigrate
- define a strategy for migrating off draning nodesnetwork
- define network settings (ports, throughput, DNS)periodic
- run a nomad job like a cronreschedule
- define a schedule strategy to move all job’s allocations (to another node) if any allocation become “failed”resource
- describe the requirements for a task to executerestart
- define a tasks’s behaviour on task failureservice
- describe how tasts are registered in Consulspread
- define the way how tasks should spread across nodestemplate
- simple template engine to render configuration filesupdate
- define a strategy job’s updates (rolling upgrades, canary, blue-green)volume
- allows a group to require a volume from clustervolume_mount
- mount a volume in a specific location inside a task
Hierarchy
job
\_ group
\_ task
Example
# This declares a job named "docs". There can be exactly one
# job declaration per job file.
job "docs" {
# Specify this job should run in the region named "us". Regions
# are defined by the Nomad servers' configuration.
region = "us"
# Spread the tasks in this job between us-west-1 and us-east-1.
datacenters = ["us-west-1", "us-east-1"]
# Run this job as a "service" type. Each job type has different
# properties. See the documentation below for more examples.
type = "service"
# Specify this job to have rolling updates, two-at-a-time, with
# 30 second intervals.
update {
stagger = "30s"
max_parallel = 2
}
# A group defines a series of tasks that should be co-located
# on the same client (host). All tasks within a group will be
# placed on the same host.
group "webs" {
# Specify the number of these tasks we want.
count = 5
network {
# This requests a dynamic port named "http". This will
# be something like "46283", but we refer to it via the
# label "http".
port "http" {}
# This requests a static port on 443 on the host. This
# will restrict this task to running once per host, since
# there is only one port 443 on each host.
port "https" {
static = 443
}
}
# The service block tells Nomad how to register this service
# with Consul for service discovery and monitoring.
service {
# This tells Consul to monitor the service on the port
# labelled "http". Since Nomad allocates high dynamic port
# numbers, we use labels to refer to them.
port = "http"
check {
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
# Create an individual task (unit of work). This particular
# task utilizes a Docker container to front a web application.
task "frontend" {
# Specify the driver to be "docker". Nomad supports
# multiple drivers.
driver = "docker"
# Configuration is specific to each driver.
config {
image = "hashicorp/web-frontend"
}
# It is possible to set environment variables which will be
# available to the task when it runs.
env {
"DB_HOST" = "db01.example.com"
"DB_USER" = "web"
"DB_PASS" = "loremipsum"
}
# Specify the maximum resources required to run the task,
# include CPU and memory.
resources {
cpu = 500 # MHz
memory = 128 # MB
}
}
}
}
Running Nomad
- dev mode (do not use dev mode on production)
- agent runs on servers and clients
- UI runs on port
4646
- http://localhost:4646
$ nomad agent -dev
Sample job
$ nomad job init
Demo Examples
Static Port
job "http-echo" {
datacenters = ["dc1"]
type = "service" # default
group "echo" {
count = 1
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":8080",
"-text", "Hello there from 127.0.0.1:8080"
]
}
resources {
network {
mbits = 10
port "http" {
static = 8080
}
}
}
}
}
}
Dynamic Port
job "http-echo" {
datacenters = ["dc1"]
type = "service" # default
group "echo" {
count = 1
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello there from 127.0.0.1:${NOMAD_PORT_http}"
]
}
resources {
network {
mbits = 10
port "http" { }
}
}
}
}
}
Service Discovery aka Consul
- adding more than 1 container (of given image)
- challanges: ports
- simplify sending requests
Consul
- run Consul
asdf plugin-add consul
asdf install consul 1.8.4
asdf local consul 1.8.4
or
https://www.consul.io/downloads
- or do it through Nomad
- use
raw_exec
to run Consul - use
artifact
to download binary and unpack - http://127.0.0.1:8500/ui
- use
Run Consul
job "consul" {
datacenters = ["dc1"]
group "consul" {
task "consul" {
driver = "raw_exec"
config {
command = "consul"
args = ["agent", "-dev"]
}
artifact {
source = "https://releases.hashicorp.com/consul/1.8.4/consul_1.8.4_linux_amd64.zip"
}
}
}
}
- visit http://127.0.0.1:8500/ui
Register service
job "http-echo" {
datacenters = ["dc1"]
type = "service" # default
group "echo" {
count = 3
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello there from 127.0.0.1:${NOMAD_PORT_http}"
]
}
resources {
network {
mbits = 10
port "http" {}
}
}
service {
name = "http-echo"
tags = ["http-echo", "we", "need", "mode", "tags"]
port = "http"
check {
name = "http-echo port alive"
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}
Internal Load Balancing
- use Fabio deployed as container
- internal load balancer - fabio
- http://localhost:9998/routes
watch -n 0.2 "curl -s http://localhost:9999/http-echo-dynamic"
(and keep it open all the time)
Run Fabio
job "fabio" {
datacenters = ["dc1"]
group "fabio" {
task "fabio" {
driver = "docker"
config {
network_mode = "host"
image = "fabiolb/fabio:1.5.14-go1.15"
args = ["-proxy.strategy=rr"]
}
resources {
network {
mbits = 10
port "lb" {
static = 9998
}
port "ui" {
static = 9999
}
}
}
}
}
}
job "http-echo" {
datacenters = ["dc1"]
type = "service" # default
group "echo" {
count = 3
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello there from 127.0.0.1:${NOMAD_PORT_http}"
]
}
resources {
network {
mbits = 100
port "http" {}
}
}
service {
name = "http-echo"
tags = [
"http-echo",
"urlprefix-/echo" # <--LOOK-HERE--
]
port = "http"
check {
name = "http-echo port alive"
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}
- run checker
watch -n 0.2 "curl -s http://localhost:9999/echo"
Deployment and Job Versions
- Nomad servers (masters) keep a state of each job
- use
plan
orrun
(like terraform does) - use
terraform
(with nomad provider)
$ nomad job plan job.nomad
..
...
Job Modify Index: 0
To submit the job with version verification run:
nomad run job -check-index 0 job.nomad
$ nomad run job -check-index 0 job.nomad
==> Monitoring evaluation "dc8266e6"
Evaluation triggered by job "http-echo"
Allocation "54ac20ff" created: node "25f73cb5", group "echo"
Evaluation within deployment: "543d74a0"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "dc8266e6" finished with status "complete"
Deployments
Canary
job "http-echo" {
datacenters = ["dc1"]
type = "service" # default
update {
canary = 1
max_parallel = 2
# auto_promote = 1
}
group "echo" {
count = 3
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello there from 127.0.0.1:${NOMAD_PORT_http} - CANARY 1/2"
]
}
resources {
network {
mbits = 100
port "http" {}
}
}
service {
name = "http-echo"
tags = [
"http-echo",
"urlprefix-/echo"
]
port = "http"
check {
name = "http-echo port alive"
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}
Environment variables
- visit http://127.0.0.1:8500/ui/dc1/kv
- add a key/value
- key:
dummy_key
- value:
dummy_value
- key:
job "http-echo" {
datacenters = ["dc1"]
type = "service" # default
update {
canary = 3
max_parallel = 3
# auto_promote = 1
}
group "echo" {
count = 3
task "server" {
driver = "docker"
template { # <--LOOK-HERE--
data = <<EOH
DUMMY_KEY="{{key "dummy_key"}}"
EOH
destination = "secrets/file.env"
env = true
}
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello there from 127.0.0.1:${NOMAD_PORT_http} - CANARY 3/3 - ${DUMMY_KEY}" # <--LOOK-HERE--
]
}
resources {
network {
mbits = 100
port "http" {}
}
}
service {
name = "http-echo"
tags = [
"http-echo",
"urlprefix-/echo"
]
port = "http"
check {
name = "http-echo port alive"
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}
CLI
$ nomad status
No running jobs
$ nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
x1e.global 127.0.0.1 4648 alive true 2 0.12.5 dc1 global
$ nomad run job.nomad
==> Monitoring evaluation "dc8266e6"
Evaluation triggered by job "http-echo"
Allocation "54ac20ff" created: node "25f73cb5", group "echo"
Evaluation within deployment: "543d74a0"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "dc8266e6" finished with status "complete"
$ nomad status
ID Type Priority Status Submit Date
http-echo service 50 running 2020-10-16T08:01:06+02:00
$ nomad status http-echo
ID = http-echo
Name = http-echo
Submit Date = 2020-10-16T08:01:06+02:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
echo 0 0 1 0 0 0
Latest Deployment
ID = 543d74a0
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
echo 1 1 0 0 2020-10-16T08:11:06+02:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
54ac20ff 25f73cb5 echo 0 run running 48s ago 46s ago
$ nomad status 54ac20ff
ID = 54ac20ff-4390-9884-eb35-f691a7241b50
Eval ID = dc8266e6
Name = http-echo.echo[0]
Node ID = 25f73cb5
Node Name = x1e
Job ID = http-echo
Job Version = 0
Client Status = running
Client Description = Tasks are running
Desired Status = run
Desired Description = <none>
Created = 1m14s ago
Modified = 1m12s ago
Deployment ID = 543d74a0
Deployment Health = unset
Task "server" is "running"
Task Resources
CPU Memory Disk Addresses
0/100 MHz 288 KiB/300 MiB 300 MiB http: 127.0.0.1:24183
Task Events:
Started At = 2020-10-16T06:01:09Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2020-10-16T08:01:09+02:00 Started Task started by client
2020-10-16T08:01:06+02:00 Driver Downloading image
2020-10-16T08:01:06+02:00 Task Setup Building Task Directory
2020-10-16T08:01:06+02:00 Received Task received by client
$ nomad exec 54ac20ff bash
$
History and revert
$ nomad job history fabio
Version = 4
Stable = true
Submit Date = 2020-10-14T12:58:08+02:00
Version = 3
Stable = true
Submit Date = 2020-10-14T12:57:13+02:00
Version = 2
Stable = true
Submit Date = 2020-10-14T12:54:04+02:00
Version = 1
Stable = true
Submit Date = 2020-10-14T12:51:53+02:00
Version = 0
Stable = false
Submit Date = 2020-10-14T12:51:42+02:00
$ nomad job revert fabio 3
Schedule jobs
job "cron" {
datacenters = ["dc1"]
type = "batch"
periodic {
cron = "*/1 * * * * *"
}
group "cron" {
task "cron" {
driver = "raw_exec"
config {
command = "/tmp/cron"
}
}
}
}