The second M.E.L.I.G. MeetUp was about the famous ELK stack (Elasticsearch / Logstash / Kibana).

Again, we were kindly hosted by InnoQ (they have a podcast). Thanks for that...

M.E.L.I.G?

M.E.L.I.G. is about addressing all things related to infrastructure services around:

  • Metrics: Think graphite, InfluxDB, Cyanite, OpenTSDB and a like. Including middleware such as graphite-api, carbon-relay-ng
  • Events & Logs where events are business related logs you do not want to miss and generic log information that might be used for debugging purposes
  • Inventory services to get to know how the infrastructure looks like and looked like in the past
  • Glue like Docker, heka, fluentd, runC, storage backend and so on; basically stuff that does not fit in the other parts. :)

MELIG #2

In the course of the second MeetUp I introduced docker compose, tipped my toe into Consul, showed of how elasticsearch differs from other data-bases and gave an extensive introduction of a logstash-based log analytics stack. Towards the end we talk about how to combine metrics and logs to get the best of it.

Youtube [german]

Since we got no english speaker in the crowd, we stuck to german. The quality is not the best to begin with; I am working on a capturing setup to get a better quality. Unfortunately most of the consumer devices are meant to work with Windows - I have to find out what that 'Windows' refers to. :)

Do it Yourself

As promised I am going to write a blog post with step-by-step instructions to recreate what was done in the MeetUp to help you lot get the most out of it. So, what did we do last time around....?

Docker Compse 101 (video >0h2m0s)

To demonstrate how docker stacks are spun up easily I showed off how to fire up and link two containers.

First we start a server (to which we link after) by hand.

$ docker run -d --name server -p 80 qnib/terminal
66a9920d9cbcc3c24a20591b4e56f6064871d31829964856f3354d4dfea13038

By starting a second container and link in the first the environment will hold all environment variables created via Dockerfile (of qnib/terminal and his parents) and those which describe the exposed ports information

$ docker run -ti --link server:server ubuntu:15.10 bash
root@6316e1882dda:/# env|grep SERVER
SERVER_ENV_SYSLOG_SERVER=logstash.syslog.service.consul
SERVER_ENV_SYSLOG_PROTO=TCP
SERVER_PORT=tcp://172.17.0.19:80
SERVER_ENV_SYSLOG_PORT=5514
SERVER_PORT_80_TCP_PORT=80
SERVER_PORT_80_TCP_ADDR=172.17.0.19
SERVER_PORT_80_TCP=tcp://172.17.0.19:80
SERVER_PORT_80_TCP_PROTO=tcp
SERVER_NAME=/pensive_bhaskara/server
SERVER_ENV_BOOTSTRAP_CONSUL=false
SERVER_ENV_TERM=xterm
SERVER_ENV_RUN_SERVER=false

This environment variables can then be used to configure services within the container to know where to look for certain services of the server container.

One drawback is, that if the server container is decided to be restarted the link is broken. :( This could be addressed by using a service discovery mechanism like...

Consul (video >0h7m45s)

The swiss army knife of service checks is Consul. It combines

  • health checks,
  • exposes all information via RESTful API,
  • Key/Value store,
  • DNS interface to discover services.

I used an example which can be found underneath my orchestra repository.

$ git clone https://github.com/ChristianKniep/orchestra.git
Cloning into 'orchestra'...
remote: Counting objects: 891, done.
remote: Total 891 (delta 1), reused 1 (delta 1), pack-reused 889
Receiving objects: 100% (891/891), 4.67 MiB | 2.23 MiB/s, done.
Resolving deltas: 100% (381/381), done.
Checking connectivity... done.
$ cd orchestra/consul/

It consists of two containers...

$ docker-compose up -d
Creating consul_consul_1...
Creating consul_terminal_1..
$

... one holds the consul server, the other acts as a client with no services [yet].

When hooking a new bash to the client (think ssh into it) we can see the members of the cluster.

$ docker exec -ti consul_terminal_1 bash
[root@terminal /]# consul members
Node      Address           Status  Type    Build  Protocol  DC
terminal  172.17.0.22:8301  alive   client  0.5.2  2         dc1
consul    172.17.0.21:8301  alive   server  0.5.2  2         dc1
[root@terminal /]#

The WebUI (point your browser to <dockerhost>:8500, if you use your local machine its localhost:8500, with docker-machine it's most likely 192.168.99.100:8500) shows two nodes, one got one server (the consul service, that is).

The RESTful API also shows one service.

[root@terminal /]# /data/curl_services.sh
+ curl -s localhost:8500/v1/catalog/services
+ jq .
{
  "consul": []
}
[root@terminal /]#

Consul describes services in json blobs as the following.

{
  "services": [
    {
      "id": "isc0",
      "name": "isc0",
      "tags": [ "cloud", "ok" ],
      "port": 20150,
      "checks": [
        {
          "script": "echo 'drop out with EC:0'; exit 0",
          "interval": "5s"
        }
      ]
    },{
      "id": "isc1",
      "name": "isc1",
      "tags": [ "cloud", "warn" ],
      "port": 20151,
      "checks": [
        {
          "script": "echo 'drop out with EC:1'; exit 1",
          "interval": "5s"
        }
      ]
    },{
      "id": "isc2",
      "name": "isc2",
      "tags": [ "cloud", "crit" ],
      "port": 20152,
      "checks": [
        {
          "script": "echo 'drop out with EC:2' 1>&2; exit 2",
          "interval": "5s"
        }
      ]
    }
  ]
}

Similar to NAGIOS the return code of the check determines the state of the service.

  • EC0: OK, everything is fine
  • EC1: WARN, will still be served, but be careful
  • EC2+: CRIT, service will not be exposed by the DNS interface anymore.

By putting the blob into the consul directory and reload consul...

[root@terminal /]# cp /data/test.json /etc/consul.d/
[root@terminal /]# consul reload
Configuration reload triggered
[root@terminal /]#

three services are shown; the first is OK, the second warning and the third is critical.

The service is exposed via RESTful and DNS interface.

[root@terminal /]# /data/curl_service_isc0.sh
+ curl -s localhost:8500/v1/catalog/service/isc0
+ jq .
[
  {
    "ServicePort": 20150,
    "ServiceAddress": "",
    "ServiceTags": [
      "cloud",
      "ok"
    ],
    "ServiceName": "isc0",
    "ServiceID": "isc0",
    "Address": "172.17.0.22",
    "Node": "terminal"
  }
]
[root@terminal /]# dig +short SRV isc0.service.consul
1 1 20150 terminal.node.dc1.consul.
[root@terminal /]#

After we finished we have to kill the containers.

$ docker-compose kill; docker-compose rm --force
Killing consul_terminal_1... done
Killing consul_consul_1... done
Going to remove consul_terminal_1, consul_consul_1
Removing consul_terminal_1... done
Removing consul_consul_1... done
$

Elasticsearch (video >0h17m45s)

In this section I describe how Elasticsearch works... I think there are better blogpost out there to get this.

qnib/elk (video >0h29m30s)

I quickly showed my ELK container, which I described in another blog post: qnib/elk comes now with kibana4

Log Analytics by Example (video >0h34m30s)

The example I was aiming for uses a couple of nginx web server and a load balancer to motivate some logs to analyse with logstash.

Iteration #1

To fire it up, we use the branch melig#2.

orchestra (master) $ git checkout melig#2
orchestra (melig#2) $ cd nginx-elk
$ git tag
melig2.0
melig2.1
melig2.2
melig2.3
neic
$ git checkout melig2.0
*snip*
$ docker-compose up -d
Creating nginxelk_consul_1...
Creating nginxelk_elasticsearch_1...
Creating nginxelk_lb_1...
Creating nginxelk_www_1...
Creating nginxelk_kibana3_1...
Creating nginxelk_logstash_1...
$

The logstash config is pretty minimalistic...

$ cat logstash/etc/nginx.conf
input {
    syslog {
        port => 5514
        type => syslog
    }
    udp {
        port => 55514
        type => "udp"
        buffer_size => 8192
        codec => "json"
    }
}


output {
    elasticsearch {
        host => "elasticsearch.service.consul"
        protocol => http
    }
}

By hitting the loadbalancer at port <docker_host>:8000 a cat is presented. The load-balancer logs to syslog and thus to logstash.

Through Kibana (port <docker_host>:80) one discovers the nginx log.

{
  "_index": "logstash-2015.11.02",
  "_type": "udp",
  "_id": "AVDHGz8teMoSD8OGNG5j",
  "_score": null,
  "_source": {
    "@timestamp": "2015-11-02T07:31:02.470Z",
    "@message": " nginx 2015-11-02T08:31:02+01:00 192.168.99.1 \"GET /cat6.jpg HTTP/1.1\" body_bytes_sent:52772 upstream_status:200 172.17.0.10:80 upstream_response_time:0.001",
    "@fields": {
      "host": "loadbalancer",
      "syslog_facility": "user",
      "syslog_facility_code": "1",
      "syslog_severity": "info",
      "syslog_severity_code": "6",
      "program": "supervisord",
      "pid": "-"
    },
    "@version": "1",
    "type": "udp",
    "host": "172.17.0.9"
  },
  "sort": [
    1446449462470,
    1446449462470
  ]
}

Iteration #2

This json is pretty unrefined, therefore the next iteration uses a filter to reformat and tidy up the structure a bit.

$ git diff melig2.1
diff --git a/nginx-elk/logstash/etc/nginx.conf b/nginx-elk/logstash/etc/nginx.conf
index 0e8a3dc..ddb0bbb 100644
--- a/nginx-elk/logstash/etc/nginx.conf
+++ b/nginx-elk/logstash/etc/nginx.conf
@@ -11,29 +11,6 @@ input {
     }
 }

-filter {
-    if [@message] {
-        mutate {
-            add_field => { "message" => "%{@message}" }
-            add_field => { "timestamp" => "%{@timestamp}" }
-            add_field => { "version" => "%{@version}" }
-            add_field => { "pid" => "%{[@fields][pid]}" }
-            add_field => { "program" => "%{[@fields][program]}" }
-            add_field => { "syslog_facility" => "%{[@fields][syslog_facility]}" }
-            add_field => { "syslog_facility_code" => "%{[@fields][syslog_facility_code]}" }
-            add_field => { "syslog_severity" => "%{[@fields][syslog_severity]}" }
-            add_field => { "syslog_severity_code" => "%{[@fields][syslog_severity_code]}" }
-            replace => [ "host", "%{[@fields][host]}" ]
-            remove_field => [
-                "@message", "@version",
-                "[@fields][program]", "[@fields][host]", "[@fields][pid]",
-                "[@fields][syslog_facility]",  "[@fields][syslog_facility_code]",
-                "[@fields][syslog_severity]", "[@fields][syslog_severity_code]"
-            ]
-        }
-    }
-}
-

 output {
     elasticsearch {

By checking out the tag and removing the stub-file which triggers a logstash-reload, thus the change is applied.

$ git checkout melig2.1
Previous HEAD position was ea979c4... w/o filtering in logstash
HEAD is now at 67ecd08... add filter to bring order in logs
$ rm -f logstash/etc/remove_to_restart_logstash

After a bit a reloaded cat picture will generate a slightly different output in kibana.

{
  "_index": "logstash-2015.11.02",
  "_type": "udp",
  "_id": "AVDHIMPBeMoSD8OGNG6h",
  "_score": null,
  "_source": {
    "@timestamp": "2015-11-02T07:37:04.542Z",
    "@fields": {},
    "type": "udp",
    "host": "loadbalancer",
    "message": " nginx 2015-11-02T08:37:04+01:00 192.168.99.1 \"GET /cat6.jpg HTTP/1.1\" body_bytes_sent:52772 upstream_status:200 172.17.0.10:80 upstream_response_time:0.001",
    "timestamp": "2015-11-02T07:37:04.542Z",
    "version": "1",
    "pid": "-",
    "program": "supervisord",
    "syslog_facility": "user",
    "syslog_facility_code": "1",
    "syslog_severity": "info",
    "syslog_severity_code": "6"
  },
  "sort": [
    1446449824542,
    1446449824542
  ]
}

The fields dictionary is gone, the structure is more direct.

To spice things up, lets start more web servers.

$ docker-compose scale www=3
Creating and starting 2... done
Creating and starting 3... done
$

Iteration #3

There are still a lot of empty messages, furthermore the logging facility supervisord is misleading, since it's just the init system and the real program is nginx. The next tag addresses that.

$ git diff melig2.2
diff --git a/nginx-elk/logstash/etc/nginx.conf b/nginx-elk/logstash/etc/nginx.conf
index 52b9abd..0e8a3dc 100644
--- a/nginx-elk/logstash/etc/nginx.conf
+++ b/nginx-elk/logstash/etc/nginx.conf
@@ -32,21 +32,6 @@ filter {
             ]
         }
     }
-    if [message] == "" {
-        drop{}
-    }
-    if [program] == "kernel" {
-        drop{}
-    }
-    if [program] == "supervisord" {
-        grok {
-            patterns_dir => "/etc/grok/patterns/"
-            match => [ "message", "%{SD_PROG:program} %{GREEDYDATA:message}" ]
-            overwrite => [ "program", "message" ]
-            add_tag => [ "supervisord" ]
-            add_field => { "agent" => "supervisord" }
-        }
-    }
 }
$ git checkout melig2.2
Previous HEAD position was 67ecd08... add filter to bring order in logs
HEAD is now at ed741c9... drop msg and handle supervisor
$ rm -f logstash/etc/remove_to_restart_logstash
$

After reloading a cat (be aware of the browser cache), the log message has changed a bit.

{
  "_index": "logstash-2015.11.02",
  "_type": "udp",
  "_id": "AVDHJkokeMoSD8OGNG89",
  "_score": null,
  "_source": {
    "@timestamp": "2015-11-02T07:43:05.606Z",
    "@fields": {},
    "type": "udp",
    "host": "loadbalancer",
    "message": "2015-11-02T08:43:05+01:00 192.168.99.1 \"GET /cat11.jpg HTTP/1.1\" body_bytes_sent:54713 upstream_status:200 172.17.0.10:80 upstream_response_time:0.006",
    "timestamp": "2015-11-02T07:43:05.606Z",
    "version": "1",
    "pid": "-",
    "program": "nginx",
    "syslog_facility": "user",
    "syslog_facility_code": "1",
    "syslog_severity": "info",
    "syslog_severity_code": "6",
    "agent": "supervisord",
    "tags": [
      "supervisord"
    ]
  },
  "sort": [
    1446450185606,
    1446450185606
  ]
}

The program is more accurate and the log message itself got rid of the prefix nginx. Furthermore we got less messages, since we drop empty once.

Iteration #4

The last part extracts the information from the log itself.

$ git diff melig2.3
diff --git a/nginx-elk/logstash/etc/nginx.conf b/nginx-elk/logstash/etc/nginx.conf
index d4ee7d1..52b9abd 100644
--- a/nginx-elk/logstash/etc/nginx.conf
+++ b/nginx-elk/logstash/etc/nginx.conf
@@ -47,13 +47,6 @@ filter {
             add_field => { "agent" => "supervisord" }
         }
     }
-    if [program] == "nginx" {
-        grok {
-            match => [ "message", "%{TIMESTAMP_ISO8601:time} %{IP:client_ip} \"%{WORD:http_method} %{PATH:http_request} %{NOTSPACE:http_ver}\" body_bytes_sent:%{INT:body_bytes_sent} upstream_status:%{INT:http_upstream_status} %{IP:upstream_ip}:%{INT:upstream_port} upstream_response_time:%{INT:upstream_response_time}" ]
-            add_tag => [ "melig#2" ]
-        }
-
-    }
 }
 $ git checkout melig2.3
Previous HEAD position was ed741c9... drop msg and handle supervisor
HEAD is now at d3e14a8... handle nginxlog
$ rm -f logstash/etc/remove_to_restart_logstash
$

This leads to a rich json blob, with all information needed to decide what to do or configure kibana to show nice charts.

{
  "_index": "logstash-2015.11.02",
  "_type": "udp",
  "_id": "AVDHKOfqeMoSD8OGNG8_",
  "_score": null,
  "_source": {
    "@timestamp": "2015-11-02T07:45:57.666Z",
    "@fields": {},
    "type": "udp",
    "host": "loadbalancer",
    "message": "2015-11-02T08:45:57+01:00 192.168.99.1 \"GET /cat10.jpg HTTP/1.1\" body_bytes_sent:99948 upstream_status:200 172.17.0.14:80 upstream_response_time:0.007",
    "timestamp": "2015-11-02T07:45:57.666Z",
    "version": "1",
    "pid": "-",
    "program": "nginx",
    "syslog_facility": "user",
    "syslog_facility_code": "1",
    "syslog_severity": "info",
    "syslog_severity_code": "6",
    "agent": "supervisord",
    "tags": [
      "supervisord",
      "melig#2"
    ],
    "time": "2015-11-02T08:45:57+01:00",
    "client_ip": "192.168.99.1",
    "http_method": "GET",
    "http_request": "/cat10.jpg",
    "http_ver": "HTTP/1.1",
    "body_bytes_sent": "99948",
    "http_upstream_status": "200",
    "upstream_ip": "172.17.0.14",
    "upstream_port": "80",
    "upstream_response_time": "0"
  },
  "sort": [
    1446450357666,
    1446450357666
  ]
}

Post-Demo (video >1h0m)

After the demo I went a bit into how to bundle a metrics and a log backend. I keep this piece in the bucket as a different blog entry (/ MeetUp). :

So long...

Christian