Skip to content

Docker volume monitoring with Ruby, Sensu and Uchiwa.

In this post I am going to demonstrate how to monitor your docker volumes with Sensu.
I came across a problem when our Jenkins instances were running out of space and no jobs could be scheduled because of this,
so before it was too late, it would be very useful to have something in place that will show if some container is too greedy, eating all the space on the volume.

In general we are going to look at the next things today:
1. Some Ruby scripting
2. Identifying disk and docker volume usage commands
3. Configuring Sensu server and client for monitoring
4. Making script run as a root
5. Running a simple Uchiwa dashboard

1. Some Ruby scripting
So first we are going to write the script which will check the volume and report if usage is higher than we configured.
Following best Sensu practices we will write it in Ruby, we probably could also use bash, but it really gets messy once we add more logic and lines.

#!/usr/bin/env /opt/sensu/embedded/bin/ruby

max_size = ARGV[0].to_i
container_name_filter = ARGV[1]
message = ""

procs=`du -sk  /var/lib/docker/volumes/* | sort -rn`
procs.each_line do | process|
  result = process.split(" ")
  vol_usage = result[0].to_i/1024
  vol_name = result[1].gsub "/var/lib/docker/volumes/", ''

  if vol_usage > max_size
    cont_name = `docker ps --filter=volume=#{vol_name} --filter=name=#{container_name_filter} --format {{.Names}}`
    if !cont_name.empty?
      message = message + "container: #{cont_name.delete!("\n")} volume exceeds max disk usage(#{max_size}MB): #{vol_usage}MB; \n"
    end
  end
end

unless message.empty?
  puts message
  exit 1
end

2. Identifying disk and docker volume usage commands

The main command here is ‘du’ which does the job for us and shows how much space each directory is using.
Then we loop through the list and ask docker which container is using given volume. Bear in mind there could be no containers currently running or deleted but volumes still exist so we have to check that container name isn’t empty.

In out example we have 2 volumes:

vagrant@sensuclient:~$ docker volume ls
DRIVER              VOLUME NAME
local               my-vol
local               testme
vagrant@sensuclient:~$ 

Let’s run two containers and attach each to one of them:

docker run --rm -d -v testme:/tmp --name  alpine1 -it alpine /bin/sh
docker run -d -v my-vol:/test --name nodeTest -p 8080:8085 --rm kayan/node-web-app nodeTest 8085

Both docker images publicly available so you can test them in the same way.

vagrant@sensuclient:~$ docker ps
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                    NAMES
f94a83a39e5e        kayan/node-web-app   "node server.js no..."   About an hour ago   Up About an hour    0.0.0.0:8080->8085/tcp   nodeTest
3857bbe70709        alpine               "/bin/sh"                About an hour ago   Up About an hour                             alpine1
vagrant@sensuclient:~$ 

Now if we run ‘du’ we get the next picture:

vagrant@sensuclient:~$ sudo du -sk  /var/lib/docker/volumes/* | sort -rn
8640  /var/lib/docker/volumes/my-vol
6488  /var/lib/docker/volumes/testme
24  /var/lib/docker/volumes/metadata.db
vagrant@sensuclient:~$ 

Obviously you need to generate some data in those volumes first.

Let’s grab one of volumes and ask docker who is to blame:

vagrant@sensuclient:~$ docker ps --filter=volume=my-vol  --format {{.Names}}
nodeTest
vagrant@sensuclient:~$

Once we have all running, we can test our script, let’s say I wanna see all containers that are using more than 6Mb:

vagrant@sensuclient:~$ sudo /opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check_volume_size.rb 6
container: nodeTest volume exceeds max disk usage(6MB): 8MB; 
vagrant@sensuclient:~$ 

Or only containers with ‘alpine’ in their name(or Jenkins in our actual case):

vagrant@sensuclient:~$ sudo /opt/sensu/embedded/bin/ruby /etc/sensu/plugins/check_volume_size.rb 1 alpine
container: alpine1 volume exceeds max disk usage(1MB): 6MB; 
vagrant@sensuclient:~$ 

3. Configuring Sensu server and client for monitoring

Once the script is working time to set up Sensu.

First we need to configure our server with a new check:

vagrant@sensu:~$ cat /etc/sensu/conf.d/check_volume_size.json
{
  "checks": {
    "jenkins_volume_size": {
      "notification": "Jenkins volume size problem",
      "command": "sudo /etc/sensu/plugins/check_volume_size.rb 1 alpine true",
      "subscribers": ["clnt"],
      "interval": 10,
      "handlers": ["default"]
      }
  }
}
vagrant@sensu:~$ 

it is going to run the check every 10 seconds, but where? That is where our client comes into the play, as will have single server,
but may have multiple docker(or any other) hosts, we will install and configure clients on every hosts we are interested in.
The Check is configured for all subscriptions with name (subscribers)’clnt’, so clients have to subscribe to (subscriptions)’clnt’:

vagrant@sensuclient:~$ cat /etc/sensu/conf.d/client.json 
{
  "client": {
    "name": "client1",
    "address": "localhost",
    "subscriptions": [ "clnt" ]
  }
}
vagrant@sensuclient:~$ 

Once our client host is configured, the script should also be located on the client host, at the path configured on the server’s Check,
which in our case is ‘sudo /etc/sensu/plugins/check_volume_size.rb’.

The other thing we need to configure on the client side is connection to server, Sensu works over rabbitmq which is installed on the server as well,
so config will look like below:

vagrant@sensuclient:~$ cat /etc/sensu/conf.d/rabbitmq.json 
{
  "rabbitmq": {
    "host": "192.168.2.212",
    "port": 5672,
    "vhost": "/sensu",
    "user": "sensu",
    "password": "pass"
  }
}

On the server side api and rabbitmq config will look like below:

vagrant@sensu:~$ cat /etc/sensu/conf.d/rabbitmq.json 
{"rabbitmq": {
    "host": "127.0.0.1",
    "port": 5672,
    "vhost": "/sensu",
    "user": "sensu",
    "password": "pass"
  }
}

vagrant@sensu:~$ cat /etc/sensu/conf.d/api.json 
{
  "api": {
    "host": "127.0.0.1",
    "port": 4567
  }
}

4. Making script run as root

Please note, as our script is going to run next command:

 
sudo du -sk  /var/lib/docker/volumes/*

it will need to run as a sudo, because of only root access to the path:

vagrant@sensuclient:~$ ls -ld /var/lib/docker
drwx--x--x 11 root root 4096 Feb 16 19:13 /var/lib/docker

If we check the logs:

vagrant@sensuclient:~$ sudo tail -20  /var/log/sensu/sensu-client.log | grep alpine | jq .

we will see something like below:

{
  "check": {
    "issued": 1518810884,
    "name": "jenkins_volume_size",
    "handlers": [
      "default"
    ],
    "command": "sudo /etc/sensu/plugins/check_volume_size.rb 1 alpine true",
    "notification": "Jenkins volume size problem"
  },
  "message": "received check request",
  "level": "info",
  "timestamp": "2018-02-16T19:54:44.517078+0000"
}
{
  "payload": {
    "check": {
      "status": 1,
      "notification": "Jenkins volume size problem",
      "command": "sudo /etc/sensu/plugins/check_volume_size.rb 1 alpine true",
      "handlers": [
        "default"
      ],
      "name": "jenkins_volume_size",
      "issued": 1518810884,
      "executed": 1518810884,
      "duration": 0.007,
      "output": "sudo: no tty present and no askpass program specified\n"
    },
    "client": "client1"
  },
  "message": "publishing check result",
  "level": "info",
  "timestamp": "2018-02-16T19:54:44.524107+0000"
}

As you can see the client received ‘check’ from server, but it can’t run the script:
‘sudo: no tty present and no askpass program specified’

So you can either run ‘sudo visudo’ and make modifications to ‘/etc/sudoers’ file, or follow best practices and just edit that file
to include files located at ‘/etc/sudoers.d’ with specific rule to run our ruby script as, to keep things neat:

vagrant@sensuclient:~$ sudo  grep sudoers.d  /etc/sudoers
# Please consider adding local content in /etc/sudoers.d/ instead of
#includedir /etc/sudoers.d
vagrant@sensuclient:~$ 

Once we uncommented the line above we just add a new file with rule as below:

vagrant@sensuclient:~$ sudo cat /etc/sudoers.d/sensu
sensu ALL = NOPASSWD: /etc/sensu/plugins/check_volume_size.rb

So if we check the logs now, we should be seeing our checks working:

{
  "payload": {
    "check": {
      "status": 1,
      "notification": "Jenkins volume size problem",
      "command": "sudo /etc/sensu/plugins/check_volume_size.rb 1 alpine true",
      "handlers": [
        "default"
      ],
      "name": "jenkins_volume_size",
      "issued": 1518811204,
      "executed": 1518811204,
      "duration": 0.096,
      "output": "container: alpine1 volume exceeds max disk usage(1MB): 6MB; \n"
    },
    "client": "client1"
  },
  "message": "publishing check result",
  "level": "info",
  "timestamp": "2018-02-16T20:00:04.660111+0000"
}

Obviously we haven’t created the whole thing to just check the logs manually…

5. Running a simple Uchiwa dashboard

Remember we configured API part in server config, well, that is where Uchiwa – Open source dashboard for Sensu,
is going to read notifications from. Let’s run it:

docker run -d -p 3000:3000 -v ~/uchiwa-confi:/config uchiwa/uchiwa

It’s config is very simple, and I bind mounted it on the host:

➜  ~ cat ~/uchiwa-confi/config.json 

{
    "sensu": [
        {
            "name": "Sensu",
            "host": "192.168.2.212",
            "ssl": false,
            "port": 4567,
            "path": "",
            "timeout": 5000
        }
    ],
    "uchiwa": {
        "port": 3000,
        "stats": 10,
        "refresh": 5
    }
}

As you can see it connects to server (192.168.2.212) and also exposes port at which we can access it:

You can also use alternative dashboards like Grafana, even through Sensu provides enterprise Dashboard as well, if you have the $.
Please note, Sensu comes with lots of checks and plugins and you can get lots of metrics out of box, or by using custom plugin, just
like the one we just have implemented.