8

Logstash “Hello World” Example – Part 1 of the ELK Stack Series

Today, we will first introduce Logstash, an open source project created by Elastic, before we perform a little Logstash “Hello World”: we will show how to read data from command line or from file, transform the data and send it back to command line or file. In the appendix you will find a note on Logstash CSV input performance and on how to replace the timestamp by a custom timestamp read from the input message (e.g. the input file).

For a maximum of interoperability with the host system (so the used java version becomes irrelevant), Logstash will be run in a Docker-based container sandbox.

This is the first blog post of a series about the Elastic Stack (a.k.a. ELK stack):

What is Logstash?

Logstash can collect logging data from a multitude of sources, transform the data, and send the data to a multitude of “stashes”.

Elastic’s “favorite stash” is Elasticsearch, another open source project driven by Elastic. Together with Kibana, Logstash and Elastic build the so-called ELK pipeline:

  • Elasticsearch is for searching, analyzing, and storing your data
  • Logstash (and Beats) is for collecting and transforming data, from any source, in any format
  • Kibana is a portal for visualizing the data and to navigate within the elastic stack

 

 

In the current blog post, we will restrict ourselves to simplified Hello World Pipelines like follows:

and:

We will first read and write to from command line, before we will use log files as input source and output destinations.

Tools used

  • Vagrant 1.8.6
  • Virtualbox 5.0.20
  • Docker 1.12.1
  • Logstash 5.0.1

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

We will run Logstash in a Docker container in order to allow for maximum interoperability. This way, we always can use the latest Logstash version without the need to control the java version used: e.g. Logstash v 1.4.x works with java 7, while version 5.0.x works with java 8 only, currently.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “Oracle VM Virtualbox x.x.x Setup Wizard ended prematurely” see Appendix A of this blog post: Virtualbox Installation Workaround below)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

(basesystem)# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
(basesystem)# vagrant init williamyeh/ubuntu-trusty64-docker
(basesystem)# vagrant up
(basesystem)# vagrant ssh
(dockerhost)$

Now you are logged into the Docker host and we are ready for the next step: to create the Ansible Docker image.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (optional): Download Logstash Image

This extra download step is optional, since the Logstash Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ sudo docker pull logstash
Unable to find image 'logstash:latest' locally
latest: Pulling from library/logstash

386a066cd84a: Already exists
75ea84187083: Already exists
3e2e387eb26a: Pull complete
eef540699244: Pull complete
1624a2f8d114: Pull complete
7018f4ec6e0a: Pull complete
6ca3bc2ad3b3: Pull complete
3829939e7052: Pull complete
1cf20bb3ce62: Pull complete
f737f281552e: Pull complete
f1b7aca72edd: Pull complete
fb821ca73c54: Pull complete
c1543e80c12a: Pull complete
566f64970d2a: Pull complete
de88d0e92195: Pull complete
Digest: sha256:048a18100f18cdec3a42ebaa42042d5ee5bb3acceacea027dee4ae3819039da7
Status: Downloaded newer image for logstash:latest

The version of the downloaded Logstash image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm logstash --version
logstash 5.0.1

We are using version 5.0.1 currently.

Step 3: Run Logstash als a Translator from Command Line to Command Line

In this step, we will use Logstash to translate the command line standard input (STDIN) to command line standard output (STDOUT).

Once a docker host  is available, downloading, installing and running Logstash is as simple as typing following command. If the image is already downloaded, because Step 2 was accomplished before, the download part will be skipped:

(dockerhost)$ sudo docker run -it --rm logstash -e 'input { stdin { } } output { stdout { } }'

The with the -e option, we tell Logstash to read from the command line input (STDIN) and to send all output to the command line STOUT.

The output looks like follows:

Unable to find image 'logstash:latest' locally
latest: Pulling from library/logstash

386a066cd84a: Already exists
75ea84187083: Already exists
3e2e387eb26a: Pull complete
eef540699244: Pull complete
1624a2f8d114: Pull complete
7018f4ec6e0a: Pull complete
6ca3bc2ad3b3: Pull complete
3829939e7052: Pull complete
1cf20bb3ce62: Pull complete
f737f281552e: Pull complete
f1b7aca72edd: Pull complete
fb821ca73c54: Pull complete
c1543e80c12a: Pull complete
566f64970d2a: Pull complete
de88d0e92195: Pull complete
Digest: sha256:048a18100f18cdec3a42ebaa42042d5ee5bb3acceacea027dee4ae3819039da7
Status: Downloaded newer image for logstash:latest
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
The stdin plugin is now waiting for input:
11:19:07.293 [[main]-pipeline-manager] INFO  logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
11:19:07.334 [[main]-pipeline-manager] INFO  logstash.pipeline - Pipeline main started
11:19:07.447 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

In the first part, the Logstash Docker image is downloaded from Docker Hub, if the image is not already available locally. Then there are the logs of the Logstash start and the output is stopping and is waiting for your input. Now, if we type

hello logstash

we get an output similar to

2016-11-17T11:35:10.764Z 828389ba165b hello logstash

We can stop the container by typing <Ctrl>-D and we will get an output like

11:51:20.132 [LogStash::Runner] WARN  logstash.agent - stopping pipeline {:id=>"main"}

Now let us try another output format:

(dockerhost)$ sudo docker run -it --rm logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
The stdin plugin is now waiting for input:
11:48:05.746 [[main]-pipeline-manager] INFO logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
11:48:05.760 [[main]-pipeline-manager] INFO logstash.pipeline - Pipeline main started
11:48:05.827 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

You will need to wait for ~8 sec before you can send your first log to the STDIN. Let us do that now and type:

hello logstash in ruby style

This will produce an output like

{
 "@timestamp" => 2016-11-17T11:50:24.571Z,
 "@version" => "1",
 "host" => "9cd979a20db4",
 "message" => "hello logstash in ruby style",
 "tags" => []
}

Step 3: Run Logstash als a Translator from from File to File

In this example, we will use (log) files as input source and output destination:

For this, we will create a Logstash configuration file on the Docker host as follows:

#logstash.conf
input {
  file {
    path => "/app/input.log"
  }
}

output {
  file {
    path => "/app/output.log"
  }
}

For being able to read a file in the current directory on the Docker host, we need to map the current directory to a directory inside the Docker container using the -v switch. This time we need to override the entrypoint, since we need to get access to the command line of the container itself. We cannot just

(dockerhost-terminal1)$ sudo docker run -it --rm --name logstash -v "$PWD":/app --entrypoint bash logstash

Then within the container we run logstash:

(container-terminal1)# logstash -f /app/logstash.conf

In a second terminal on the docker host, we need to run a second bash terminal within the container by issuing the command:

(dockerhost-terminal2)$ sudo docker exec -it logstash bash

Now, on the container command line, we prepare to see the output like follows:

(container-terminal2)# touch /app/output.log; tail -f /app/output.log

Now we need a third terminal, and connect to the container again. Then we send a “Hello Logstash” to the input file:

(dockerhost-terminal3)$ sudo docker exec -it logstash bash
(container-terminal3)# echo "Hello Logstash" >> /app/input.log

This will create following output on terminal 2:

{"path":"/app/input.log","@timestamp":"2016-11-17T19:53:02.728Z","@version":"1","host":"88a342b6b385","message":"Hello Logstash","tags":[]}

The output is in a format Elasticsearch understands.

In order to improve the readability of the output, we can specify a “plain” output codec in the configuration file:

#logstash.conf
input {
  file {
    path => "/app/input.log"
  }
}

output {
  file {
    path => "/app/output.log"
    codec => "plain"
  }
}

Note that a change of the Logstash configuration file content requires the logstash process to be restarted for the change to have an effect; i.e. we can stop it with Ctrl-C and restart the logstash process in terminal 1 it with

(container-terminal1)# logstash -f /app/logstash.conf

Now again

(container-terminal-3)# echo "Hello Logstash" >> /app/input.log

in terminal 3. That will produce following syslog-style output on terminal 2:

2016-11-17T20:10:39.861Z 88a342b6b385 Hello Logstash

Appendix A: Error Errno::EACCES: Permission denied if the logfile is changed from on a mapped Volume

This error has been seen by running Logstash as a Docker container with a mapped folder and manipulate the input file from the Docker host

(dockerhost)$ sudo docker run -it --rm -v "$PWD":/app logstash -f /app/logstash.conf
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
19:15:59.927 [[main]-pipeline-manager] INFO logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
19:15:59.940 [[main]-pipeline-manager] INFO logstash.pipeline - Pipeline main started
19:16:00.005 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

If we now change the input file on the docker host in a second terminal like follows:

(dockerhost)$ echo "Hello Logstash" >> input.log

we receive following output on the first terminal:

19:22:47.732 [[main]>worker1] INFO logstash.outputs.file - Opening file {:path=>"/app/output.log"}
19:22:47.779 [LogStash::Runner] FATAL logstash.runner - An unexpected error occurred! {:error=>#<Errno::EACCES: Permission denied - /app/output.log>, :backtrace=>["org/jruby/RubyFile.java:370:in `initialize'", "org/jruby/RubyIO.java:871:in `new'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:280:in `open'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:132:in `multi_receive_encoded'", "org/jruby/RubyHash.java:1342:in `each'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:131:in `multi_receive_encoded'", "org/jruby/ext/thread/Mutex.java:149:in `synchronize'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-4.0.1/lib/logstash/outputs/file.rb:130:in `multi_receive_encoded'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:90:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:12:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/output_delegator.rb:42:in `multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:297:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:296:in `output_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:252:in `worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:225:in `start_workers'"]}
(dockerhost)$

There is a problem with the synchronization of the input.log file from the Docker host to the Container causing the docker container to stop. The workaround is to run the container with a bash entrypoint and manipulate the file from within the container, as shown in the step by step guide above.

Appendix B: How to apply a custom Time Stamp

In a real customer project, I had the task to visualize the data of certain data dump files, which had their own time stamps in a custom format like follows:

2016-11-21|00:00:00|<other data>

Okay, you are right in thinking that this is a CSV with pipe (|) separator, and that the CSV Logstash plugin should be applied. However, before doing so, we can take it as an example on how to replace the built-in Logstash timestamp variable called @timestamp. This is better than creating your own timestamp variable with a different name. The latter is possible also and works with normal Kibana visualizations, but it does not seem to work with Timelion for more complex visualizations. So let us do it the right way now:

We will create a simple demonstration Logstash configuration file for demonstration of the topic like follows:

# logstash_custom_timestamp.conf
input {
  stdin { }
  file {
    path => "/app/input/*.*"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

With that, will allow for STDIN input as well as for file input from any file you dump into the path /app/input/*. For testing, we have set the start_position to the “beginning”. I.e., Logstash will always read the files from the beginning, even if it already has read part of it. In addition, by setting the sincedb_path to "/dev/null", we make sure, that Logstash forgets about which files are already processed. This way, we can restart Logstash and re-process any files in the folder.

Now let us find the time variable with a grok filter and replace the time variable with the date plugin:

filter {
  grok {
    match => {"message" => "(?<mydate>[1-9][0-9]{3}-[0-9]{2}-[0-9]{2}\|[0-9]{2}:[0-9]{2}:[0-9]{2})"}
  }

  date {
    match => ["mydate", "YYYY-MM-dd|HH:mm:ss", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601"]
    target => "@timestamp"
  }

}

The grok filter allows us to define a new interim variable named mydate, if the specified regular expression is found in the input message. In our case, we want to match something like 2016-11-21|00:00:00, i.e. one digit between 1 and 9 ([1-9]) and 3 digits between 0 and 9 ([0-9]{3}), then a dash (-), then two digits ([0-9]{2}), a.s.o.

Then we can use the date plugin to overwrite the built-in @timestamp with our variable mydate we have created with the grok filter. Within the date we can match clauses like YYYY-MM-dd|HH:mm:ss in the mydate variable and push it to the @timestamp variable.

Note, that it is not possible to just use the replace directive. If we just try to overwrite @timestamp with mydate using the replace directive, Logstash will complain that you cannot overwrite a time variable with a String variable.

output {
  stdout { codec => rubydebug }
}

Now, let us start Logstash in a Docker container and test the configuration:

(dockerhost)$ sudo docker run -it --rm --name logstash -v "$PWD":/app --entrypoint bash logstash
(container)$ logstash -f /app/logstash_custom_timestamp.conf

And now, the container is waiting for input. We do  not let it wait and we type in the line below in fat:

1966-10-23|12:00:00|birthday
{
 "mydate" => "1966-10-23|12:00:00",
 "@timestamp" => 1966-10-23T12:00:00.000Z,
 "@version" => "1",
 "host" => "02cec85c3aac",
 "message" => "1966-10-23|12:00:00|birthday",
 "tags" => []
}

Success: the built-in timestamp variable @timestamp has been updated with the date found in the input message.

Let us observe, what happens with input messages, which do not match:

this is a message that does not match
{
    "@timestamp" => 2016-12-04T10:19:17.501Z,
      "@version" => "1",
          "host" => "02cec85c3aac",
       "message" => "this is a message that does not match",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

We can see that the output is tagged with "_grokparsefailure" in this case and the timestamp is set to the current date and time, as expected.

Appendix C: Logstash CSV read Performance

In a real project, I had to read in many millions of lines of a large set of CSV files. I have experienced that it took quite a bit of time to read in the data, so I want to measure the input performance of Logstash to be able to calculate the time consumption.

Note: we will reduce the data volume by a random sampling of the input. This will optimize input and Elasticsearch performance with the trade-off that the data analysis will become less accurate. However, if each data point still has more than 100 samples, the error is expected to lower than a few per cent, if the input data has no “unhealthy” values distribution (e.g. many records with low values and only few records with very large values).

Tools used:

  • Notebook with i7-6700HQ CPU and 64 GB RAM and Windows 10 Pro
  • VirtualBox 5.0.20 r106931
  • VirtualBox VM with Ubuntu 14.04, 4GB RAM and 2 vCPU
  • Docker installed 1.12.1, build 23cf638
  • 3 Docker containers running in interactive mode (the performance in detached mode might be higher, so we will measure a lower bound of the performance):
    • Logstash 5.0.1
    • Elasticsearch 5.0.1
    • Kibana 5.0.1
  • Data input files:
  • CSV files with 12200 lines
  • Sample data lines (note that the first line of each file will be dropped by Logstash):
DATUM|ZEIT|IPV4_SRC_ADDR|IPV4_DST_ADDR|ROUTER_IP|INTF_IN|INTF_OUT|TOS|FLAGS|IP_PROTOCOL_VERSION|PROTOCOL|L4_SRC_PORT|L4_DST_PORT|IN_PKTS|IN_BYTES|FLOWS
2016-11-23|15:58:10|9.1.7.231|164.25.118.50|9.0.253.1|2|0|0|0|4|17|49384|161|6|1602|1
2016-10-23|15:58:12|9.1.7.231|9.60.64.1|9.0.253.1|2|2|0|0|4|17|51523|161|1|78|1
...

Logstash configuration

# logstash_netflow_csv_to_elasticsearch.conf
input {
  stdin { }
  file {
    path => "/app/input/*.*"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  ruby {
    # Sampling:
    code => "event.cancel if rand <= 0.90" # 10% sampling (i.e. cancel 90% of events) 
    #code => "event.cancel if rand <= 0.99" # 1% sampling (i.e. cancel 99% of events)
  } 

  # set timestamp: read from message 
  grok { 
    match => {"message" => "(?[1-9][0-9]{3}-[0-9]{2}-[0-9]{2}\|[0-9]{2}:[0-9]{2}:[0-9]{2})"}
  }

  # set timestamp: overwrite time stamp
  date {
    match => ["mydate", "YYYY-MM-dd|HH:mm:ss", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601"]
    target => "@timestamp"
  }

  csv {

    columns => [
      "DATUM",
      "ZEIT",
      "IPV4_SRC_ADDR",
      "IPV4_DST_ADDR",
      "ROUTER_IP",
      "INTF_IN",
      "INTF_OUT",
      "TOS",
      "FLAGS",
      "IP_PROTOCOL_VERSION",
      "PROTOCOL",
      "L4_SRC_PORT",
      "L4_DST_PORT",
      "IN_PKTS",
      "IN_BYTES",
      "FLOWS"
    ]

    separator => "|"
    remove_field => ["mydate"]
  }

  if ([DATUM] == "DATUM") {
    drop { }
  }

}

output {
  stdout { codec => dots }

  elasticsearch {
    action => "index"
    index => "csv"
    hosts => "elasticsearch"
    document_type => "data"
    workers => 1
  }
}

Results without Elasticsearch output

As a base line, we will perform tests with the elaisicsearch output commented out first:

Test 1) 100% Sampling -> 3,400/sec (i.e. 3,400 data sets/sec)
(10 files with 12,200 lines each in ~35 sec)

Test 2) 10% Sampling -> 6,100 lines/sec (i.e. 610 data sets/sec)
(10 files with 12,200 lines each in ~20 sec)

Test 3) 1% Sampling -> 8,100 lines/sec (i.e. 81 data sets/sec)
(10 files with 12,200 lines each in ~15 sec)

Results with Elasticsearch output

Now let us test the performance in case the data is sent to Elasticsearch:

Test 1) 100% Sampling -> 1,700/sec with 1,700 data lines/sec
(10 files with 12,200 lines each in ~70 sec)

Test 2) 10% Sampling -> 3,500 lines/sec with 350 data lines/sec
(10 files with 12,200 lines each in ~35 sec)

Test 3) 1% Sampling -> 6,100 lines/sec (i.e. 61 data sets/sec)
(10 files with 12,200 lines each in ~20 sec)

As we can see, the input rate is about 2000 lines/sec lower, if the output is sent Elasticsearch instead of sending it to console only (dots) (yellow vs. blue line).

In case of output to Elasticsearch, we get following rates graph:

  • Sampling rate 1%: if only 1% of the data records are sent to the output, the input rate is increased to 6,100 (factor ~3.6 compared to a sampling rate of 100%).
  • Sampling rate 10%: if only 10% of the data records are sent to the output, one could expect an input rate increase by the factor 10 compared to 100% sampling, if the output pipe was the bottleneck. This does not seem to be the case, since we observe an increase by the factor 2 only (3,500 lines/sec).
  • Sampling rate 100%: if all input lines are sent to the output, we can reach ~ 1,700 lines/sec

The optimum sampling rate is determined by increasing the sampling rate until the required data accuracy is reached. The data accuracy can be checked by random sampling of the same set of data several times and to observe variance of the output.

Summary

In this blog post we have created two simple Hello World examples:

  1. one for translation between command line input and command line output and
  2. a second one for translation from a file to a file.

In order to avoid any compatibility issues with the java version on the host, we have run Logstash in a Docker container. This works fine, if the input file is manipulated from within the container. As seen in Appendix A, we cannot manipulate the file on a mapped volume on the Docker Host, though.

References

 

0

How to set up Docker Monitoring via cAdvisor, InfluxDB and Grafana

Have you ever tried to monitor a docker solution? In this blog post, we will discuss three open source docker monitoring alternatives, before we will go through a step by step guide of a docker monitoring alternative that consist of the components Google cAdvisor as data source, InfluxDB as the database and Grafana for creating the graphs.

The post is built upon a blog post of Brian Christner.  However, we will take a shortcut via a docker compose file created by Dale Kate-Murray and Ross Jimenez, which helps us to spin up the needed docker containers within minutes (depending on your Internet speed).

Go to Summary ->

Docker Monitoring Alternatives

Other free docker monitoring solutions are discussed in this youtube video of Brian Christner:

  • Google cAdvisor (standalone): easy to use, no config needed
  • cAdvisor + InfluxDB + Grafana: flexible, adaptable (the one we will get hands-on experience below)
  • Prometheus: all-in-one complete monitoring solution

He is summarizing the capabilities of those solutions like follows:

@Brian Christner: I hope it is Okay, that I have copied this slide from your youtube video?

Those are open source alternatives. Brian Christner points out that you might need more complete, enterprise-level solutions than the open source alternatives can offer, e.g. Data Dog (offers a free service for up to five monitored hosts) or Sysdig (the latter also seems to be open source, though). See also this Rancher post, which compares seven docker monitoring alternatives.

Step by Step guide: “Installing” cAdvisor + InfluxDB + Grafana

Here, we will lead through a step by step guide on how to deploy a flexible docker monitoring solution consisting of Google cAdvisor as data source, InfluxDB as the database and Grafana for creating the graphs. We will make use of a docker compose file Ross Jimenez has created and Brian Christner has included in his git repository

Step 0: Prerequisites

We assume that following prerequisites are met

  • Docker is installed. A nice way to install an Ubuntu Docker host via Vagrant is described here (search for the term “Install a Docker Host”).
  • You have direct Internet access. If you need to cope with a HTTP proxy, see the official docker instructions, or, if it does not work you may try this blog post.

Step 1: Install docker-compose via Container:

On a Docker host, we will install docker-compose via docker container via following script:

# detect, whether sudo is needed:
sudo echo hallo > /dev/null 2>&1 && SUDO=sudo

# download docker-compose wrapper, if docker-compose:
$SUDO docker-compose --version || \
 $SUDO docker --version && \
 curl -L https://github.com/docker/compose/releases/download/1.8.1/run.sh | $SUDO tee /usr/local/bin/docker-compose && \
 $SUDO chmod +x /usr/local/bin/docker-compose

You might prefer a native installation of docker compose. Please check out the official documentation in this case.

Step 2: Download Docker Compose File

Now we download Brian Christner’s docker monitoring git repository. The CircleCI tests of Brian’s repository are failing currently (as of 2016-10-25), but the software seems to work anyway.

git clone https://github.com/vegasbrianc/docker-monitoring && \
cd docker-monitoring

Step 3: Start Containers

Now let us start the containers via

$ docker-compose up
Starting dockermonitoringrepaired_influxdbData_1
Starting dockermonitoringrepaired_influxdb_1
Starting dockermonitoringrepaired_grafana_1
Starting dockermonitoringrepaired_cadvisor_1
Attaching to dockermonitoringrepaired_influxdbData_1, dockermonitoringrepaired_influxdb_1, dockermonitoringrepaired_cadvisor_1, dockermonitoringrepaired_grafana_1
dockermonitoringrepaired_influxdbData_1 exited with code 0
influxdb_1 | influxdb configuration:
influxdb_1 | ### Welcome to the InfluxDB configuration file.
influxdb_1 |
influxdb_1 | # Once every 24 hours InfluxDB will report anonymous data to m.influxdb.com
influxdb_1 | # The data includes raft id (random 8 bytes), os, arch, version, and metadata.
influxdb_1 | # We don't track ip addresses of servers reporting. This is only used
influxdb_1 | # to track the number of instances running and the versions, which
...
(trunkated; see full log in the Appendix)
...
influxdb_1 | [admin] 2016/10/25 16:48:44 Listening on HTTP: [::]:8083
influxdb_1 | [continuous_querier] 2016/10/25 16:48:44 Starting continuous query service
influxdb_1 | [httpd] 2016/10/25 16:48:44 Starting HTTP service
influxdb_1 | [httpd] 2016/10/25 16:48:44 Authentication enabled: false
influxdb_1 | [httpd] 2016/10/25 16:48:44 Listening on HTTP: [::]:8086
influxdb_1 | [retention] 2016/10/25 16:48:44 Starting retention policy enforcement service with check interval of 30m0s
influxdb_1 | [monitor] 2016/10/25 16:48:44 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
influxdb_1 | 2016/10/25 16:48:44 Sending anonymous usage statistics to m.influxdb.com
influxdb_1 | [run] 2016/10/25 16:48:44 Listening for signals

Note: if you see a continuous message “Waiting for confirmation of InfluxDB service startup”, you might hit a problem described in an Appendix below. Search for “Waiting for confirmation of InfluxDB service startup” on this page.

Step 4 (optional): In a different window on the Docker host, we can test the connection like follows:

$ curl --retry 10 --retry-delay 5 -v http://localhost:8083
* Rebuilt URL to: http://localhost:8083/
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 8083 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
...
</body>

</html>
* Connection #0 to host localhost left intact

Step 5: Connect to cAdvisor, InfluxDB, Grafana

Step 5.1 (optional): Connect to cAdvisor

Now let us connect to cAdvisor. For that, you need to find out, which IP address your docker host is using. In my case, I am using a Vagrant-based Docker host and I have added an additional line

config.vm.network "private_network", ip: "192.168.33.11"

The TCP port can be seen in the docker-compose.yml file: it is 8080. This allows me to connect to cAdvisor’ dashboard via http://192.168.33.11:8080/containers/:

Step 5.2 (optional): connect to InfluxDB

InfluxDB is reachable via http://192.168.33.11:8083/:

Step 5.3 (required): Connect to Grafana

And Grafana can be reached via http://192.168.33.11:3000/: log in as admin with password admin, if you are prompted for it:

Okay, the dashboard is still empty.

Step 6: Add Data Sources to Grafana manually

Connect to Grafana (http://192.168.33.11:3000/ in my case)

Click on Data Sources -> Add new

and add following data:

Name: influxdb
Type: InfluxDB 0.9.x

Note: Be sure to check default box! Otherwise, you will see random data created by Grafana below!

Http settings
Url: http://192.168.33.11:8086 (please adapt the IP address to your environment)
Access: proxy
Basic Auth: Enabled
User: admin
Password: admin

InfluxDB Details
Database: cadvisor
User: root
Password: root

Click Add -> Test Connection (should be successful) -> Save

Step 7: Add New Dashboard to Grafana via json File

Connect to Grafana (http://192.168.33.11:3000/ in my case)

Click on , then , navigate to the cloned guthub repository, click on the button below Import File and pick the file docker-monitoring-0.9.json:

As if by an invisible hand, we get a dashboard with information on Filesystem Usage, CPU Usage, Memory Usage and Network Usage of the Containers on the host.

Note: if the graphs look like follows

and the graphs change substantially by clicking , then you most probably have forgotten to check the “default” box in step 6. In this case, you need to click on the title of the graph -> Edit -> choose influxdb as data source.

Step 9 (optional): CPU Stress Test

Since only the docker monitoring containers are running, the absolute numbers we see are quite low. Let us start a container that is stressing the CPU a little bit:

docker run -it petarmaric/docker.cpu-stress-test

The graphs of cAdvisor are reacting right away:

Let us wait a few minutes and refresh the Grafana graphs and put our focus on the CPU usage:

The data does not seem to be reliable. I have opened issue #10 for this. When looking at the InfluxDB data by specifying following URL in a browser:

http://192.168.33.11:8086/query?pretty=true&db=cadvisor&q=SELECT%20%22value%22%20FROM%20%22cpu_usage_system%22

(you need to adapt your IP address), then we get data that is changing by the factor of 10.000 within milliseconds!

{
    "results": [
        {
            "series": [
                {
                    "name": "cpu_usage_system",
                    "columns": [
                        "time",
                        "value"
                    ],
                    "values": [
                        [
                            "2016-10-24T17:12:49.021559212Z",
                            910720000000
                        ],
                        [
                            "2016-10-24T17:12:49.032153994Z",
                            20000000
                        ],
                        [
                            "2016-10-24T17:12:49.033316234Z",
                            5080000000
                        ],

Summary

Following Brian Christner’s youtube video on Docker Monitoring, we have compared three open source Docker monitoring solutions

  1. Google cAdvisor (standalone): easy to use, no config needed
  2. cAdvisor + InfluxDB + Grafana: flexible, adaptable (the one we will get hands-on experience below)
  3. Prometheus: all-in-one complete monitoring solution

By using a pre-defined docker-compose file, solution 2 can be spin up in minutes (unless you are working in a NFS synced Vagrant folder on Windows, which leads to a continuous ‘Waiting for confirmation of InfluxDB service startup’ message; see the Appendic B below; that problem I have reported here had caused quite a headache on my side).

After the data source is configured manually, a json file helps to create a nice Grafana dashboard within minutes. The dashboard shows graphs about File System Usage, CPU Usage, Memory Usage and Network Usage.

In the moment, there is a ceaveat that the data displayed is not trustworthy. This is investigated in the framework of issue #10 of Brian Christner’s repository. I will report here, when it is resolved.

Next Steps:

Appendix A: full startup log of successful ‘docker-compose up’

$ docker-compose up
Starting dockermonitoringrepaired_influxdbData_1
Starting dockermonitoringrepaired_influxdb_1
Starting dockermonitoringrepaired_grafana_1
Starting dockermonitoringrepaired_cadvisor_1
Attaching to dockermonitoringrepaired_influxdbData_1, dockermonitoringrepaired_influxdb_1, dockermonitoringrepaired_grafana_1, dockermonitoringrepaired_cadvisor_1
dockermonitoringrepaired_influxdbData_1 exited with code 0
influxdb_1 | influxdb configuration:
influxdb_1 | ### Welcome to the InfluxDB configuration file.
influxdb_1 |
influxdb_1 | # Once every 24 hours InfluxDB will report anonymous data to m.influxdb.com
influxdb_1 | # The data includes raft id (random 8 bytes), os, arch, version, and metadata.
influxdb_1 | # We don't track ip addresses of servers reporting. This is only used
influxdb_1 | # to track the number of instances running and the versions, which
influxdb_1 | # is very helpful for us.
influxdb_1 | # Change this option to true to disable reporting.
influxdb_1 | reporting-disabled = false
influxdb_1 |
influxdb_1 | # we'll try to get the hostname automatically, but if it the os returns something
influxdb_1 | # that isn't resolvable by other servers in the cluster, use this option to
influxdb_1 | # manually set the hostname
influxdb_1 | # hostname = "localhost"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [meta]
influxdb_1 | ###
influxdb_1 | ### Controls the parameters for the Raft consensus group that stores metadata
influxdb_1 | ### about the InfluxDB cluster.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [meta]
influxdb_1 | # Where the metadata/raft database is stored
influxdb_1 | dir = "/data/meta"
influxdb_1 |
influxdb_1 | retention-autocreate = true
influxdb_1 |
influxdb_1 | # If log messages are printed for the meta service
influxdb_1 | logging-enabled = true
influxdb_1 | pprof-enabled = false
influxdb_1 |
influxdb_1 | # The default duration for leases.
influxdb_1 | lease-duration = "1m0s"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [data]
influxdb_1 | ###
influxdb_1 | ### Controls where the actual shard data for InfluxDB lives and how it is
influxdb_1 | ### flushed from the WAL. "dir" may need to be changed to a suitable place
influxdb_1 | ### for your system, but the WAL settings are an advanced configuration. The
influxdb_1 | ### defaults should work for most systems.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [data]
influxdb_1 | # Controls if this node holds time series data shards in the cluster
influxdb_1 | enabled = true
influxdb_1 |
influxdb_1 | dir = "/data/data"
influxdb_1 |
influxdb_1 | # These are the WAL settings for the storage engine >= 0.9.3
influxdb_1 | wal-dir = "/data/wal"
influxdb_1 | wal-logging-enabled = true
influxdb_1 | data-logging-enabled = true
influxdb_1 |
influxdb_1 | # Whether queries should be logged before execution. Very useful for troubleshooting, but will
influxdb_1 | # log any sensitive data contained within a query.
influxdb_1 | # query-log-enabled = true
influxdb_1 |
influxdb_1 | # Settings for the TSM engine
influxdb_1 |
influxdb_1 | # CacheMaxMemorySize is the maximum size a shard's cache can
influxdb_1 | # reach before it starts rejecting writes.
influxdb_1 | # cache-max-memory-size = 524288000
influxdb_1 |
influxdb_1 | # CacheSnapshotMemorySize is the size at which the engine will
influxdb_1 | # snapshot the cache and write it to a TSM file, freeing up memory
influxdb_1 | # cache-snapshot-memory-size = 26214400
influxdb_1 |
influxdb_1 | # CacheSnapshotWriteColdDuration is the length of time at
influxdb_1 | # which the engine will snapshot the cache and write it to
influxdb_1 | # a new TSM file if the shard hasn't received writes or deletes
influxdb_1 | # cache-snapshot-write-cold-duration = "1h"
influxdb_1 |
influxdb_1 | # MinCompactionFileCount is the minimum number of TSM files
influxdb_1 | # that need to exist before a compaction cycle will run
influxdb_1 | # compact-min-file-count = 3
influxdb_1 |
influxdb_1 | # CompactFullWriteColdDuration is the duration at which the engine
influxdb_1 | # will compact all TSM files in a shard if it hasn't received a
influxdb_1 | # write or delete
influxdb_1 | # compact-full-write-cold-duration = "24h"
influxdb_1 |
influxdb_1 | # MaxPointsPerBlock is the maximum number of points in an encoded
grafana_1 | 2016/10/25 17:30:59 [I] Starting Grafana
grafana_1 | 2016/10/25 17:30:59 [I] Version: 2.6.0, Commit: v2.6.0, Build date: 2015-12-14 14:18:01 +0000 UTC
grafana_1 | 2016/10/25 17:30:59 [I] Configuration Info
grafana_1 | Config files:
grafana_1 | [0]: /usr/share/grafana/conf/defaults.ini
grafana_1 | [1]: /etc/grafana/grafana.ini
grafana_1 | Command lines overrides:
grafana_1 | [0]: default.paths.data=/var/lib/grafana
grafana_1 | [1]: default.paths.logs=/var/log/grafana
grafana_1 | Paths:
grafana_1 | home: /usr/share/grafana
grafana_1 | data: /var/lib/grafana
grafana_1 | logs: /var/log/grafana
grafana_1 |
grafana_1 | 2016/10/25 17:30:59 [I] Database: sqlite3
grafana_1 | 2016/10/25 17:30:59 [I] Migrator: Starting DB migration
grafana_1 | 2016/10/25 17:30:59 [I] Listen: http://0.0.0.0:3000
influxdb_1 | # block in a TSM file. Larger numbers may yield better compression
influxdb_1 | # but could incur a performance penalty when querying
influxdb_1 | # max-points-per-block = 1000
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [cluster]
influxdb_1 | ###
influxdb_1 | ### Controls non-Raft cluster behavior, which generally includes how data is
influxdb_1 | ### shared across shards.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [cluster]
influxdb_1 | shard-writer-timeout = "5s" # The time within which a remote shard must respond to a write request.
influxdb_1 | write-timeout = "10s" # The time within which a write request must complete on the cluster.
influxdb_1 | max-concurrent-queries = 0 # The maximum number of concurrent queries that can run. 0 to disable.
influxdb_1 | query-timeout = "0s" # The time within a query must complete before being killed automatically. 0s to disable.
influxdb_1 | max-select-point = 0 # The maximum number of points to scan in a query. 0 to disable.
influxdb_1 | max-select-series = 0 # The maximum number of series to select in a query. 0 to disable.
influxdb_1 | max-select-buckets = 0 # The maximum number of buckets to select in an aggregate query. 0 to disable.
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [retention]
influxdb_1 | ###
influxdb_1 | ### Controls the enforcement of retention policies for evicting old data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [retention]
influxdb_1 | enabled = true
influxdb_1 | check-interval = "30m"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [shard-precreation]
influxdb_1 | ###
influxdb_1 | ### Controls the precreation of shards, so they are available before data arrives.
influxdb_1 | ### Only shards that, after creation, will have both a start- and end-time in the
influxdb_1 | ### future, will ever be created. Shards are never precreated that would be wholly
influxdb_1 | ### or partially in the past.
influxdb_1 |
influxdb_1 | [shard-precreation]
influxdb_1 | enabled = true
influxdb_1 | check-interval = "10m"
influxdb_1 | advance-period = "30m"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### Controls the system self-monitoring, statistics and diagnostics.
influxdb_1 | ###
influxdb_1 | ### The internal database for monitoring data is created automatically if
influxdb_1 | ### if it does not already exist. The target retention within this database
influxdb_1 | ### is called 'monitor' and is also created with a retention period of 7 days
influxdb_1 | ### and a replication factor of 1, if it does not exist. In all cases the
influxdb_1 | ### this retention policy is configured as the default for the database.
influxdb_1 |
influxdb_1 | [monitor]
influxdb_1 | store-enabled = true # Whether to record statistics internally.
influxdb_1 | store-database = "_internal" # The destination database for recorded statistics
influxdb_1 | store-interval = "10s" # The interval at which to record statistics
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [admin]
influxdb_1 | ###
influxdb_1 | ### Controls the availability of the built-in, web-based admin interface. If HTTPS is
influxdb_1 | ### enabled for the admin interface, HTTPS must also be enabled on the [http] service.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [admin]
influxdb_1 | enabled = true
influxdb_1 | bind-address = ":8083"
influxdb_1 | https-enabled = false
influxdb_1 | https-certificate = "/etc/ssl/influxdb.pem"
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [http]
influxdb_1 | ###
influxdb_1 | ### Controls how the HTTP endpoints are configured. These are the primary
influxdb_1 | ### mechanism for getting data into and out of InfluxDB.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [http]
influxdb_1 | enabled = true
influxdb_1 | bind-address = ":8086"
influxdb_1 | auth-enabled = false
influxdb_1 | log-enabled = true
influxdb_1 | write-tracing = false
influxdb_1 | pprof-enabled = false
influxdb_1 | https-enabled = false
influxdb_1 | https-certificate = "/etc/ssl/influxdb.pem"
influxdb_1 | max-row-limit = 10000
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [[graphite]]
influxdb_1 | ###
influxdb_1 | ### Controls one or many listeners for Graphite data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[graphite]]
influxdb_1 | enabled = false
influxdb_1 | database = "graphitedb"
influxdb_1 | bind-address = ":2003"
influxdb_1 | protocol = "tcp"
influxdb_1 | # consistency-level = "one"
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Batching
influxdb_1 | # will buffer points in memory if you have many coming in.
influxdb_1 |
influxdb_1 | # batch-size = 5000 # will flush if this many points get buffered
influxdb_1 | # batch-pending = 10 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 | # udp-read-buffer = 0 # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
influxdb_1 |
influxdb_1 | ### This string joins multiple matching 'measurement' values providing more control over the final measurement name.
influxdb_1 | # separator = "."
influxdb_1 |
influxdb_1 | ### Default tags that will be added to all metrics. These can be overridden at the template level
influxdb_1 | ### or by tags extracted from metric
influxdb_1 | # tags = ["region=us-east", "zone=1c"]
influxdb_1 |
influxdb_1 | ### Each template line requires a template pattern. It can have an optional
influxdb_1 | ### filter before the template and separated by spaces. It can also have optional extra
influxdb_1 | ### tags following the template. Multiple tags should be separated by commas and no spaces
influxdb_1 | ### similar to the line protocol format. There can be only one default template.
influxdb_1 | templates = [
influxdb_1 | # filter + template
influxdb_1 | #"*.app env.service.resource.measurement",
influxdb_1 | # filter + template + extra tag
influxdb_1 | #"stats.* .host.measurement* region=us-west,agent=sensu",
influxdb_1 | # default template. Ignore the first graphite component "servers"
influxdb_1 | "instance.profile.measurement*"
influxdb_1 | ]
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [collectd]
influxdb_1 | ###
influxdb_1 | ### Controls one or many listeners for collectd data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[collectd]]
influxdb_1 | enabled = false
influxdb_1 | # bind-address = ":25826"
influxdb_1 | # database = "collectd"
influxdb_1 | # typesdb = "/usr/share/collectd/types.db"
influxdb_1 | # retention-policy = ""
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Batching
influxdb_1 | # will buffer points in memory if you have many coming in.
influxdb_1 |
influxdb_1 | # batch-size = 1000 # will flush if this many points get buffered
influxdb_1 | # batch-pending = 5 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 | # read-buffer = 0 # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [opentsdb]
influxdb_1 | ###
influxdb_1 | ### Controls one or many listeners for OpenTSDB data.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[opentsdb]]
influxdb_1 | enabled = false
influxdb_1 | # bind-address = ":4242"
influxdb_1 | # database = "opentsdb"
influxdb_1 | # retention-policy = ""
influxdb_1 | # consistency-level = "one"
influxdb_1 | # tls-enabled = false
influxdb_1 | # certificate= ""
influxdb_1 | # log-point-errors = true # Log an error for every malformed point.
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Only points
influxdb_1 | # metrics received over the telnet protocol undergo batching.
influxdb_1 |
influxdb_1 | # batch-size = 1000 # will flush if this many points get buffered
cadvisor_1 | I1025 17:30:59.170040 1 storagedriver.go:42] Using backend storage type "influxdb"
cadvisor_1 | I1025 17:30:59.170881 1 storagedriver.go:44] Caching stats in memory for 2m0s
cadvisor_1 | I1025 17:30:59.171032 1 manager.go:131] cAdvisor running in container: "/docker/9839f9c5c9d674016006e4d4144f984ea91320686356235951f21f0b51306c47"
cadvisor_1 | I1025 17:30:59.194143 1 fs.go:107] Filesystem partitions: map[/dev/dm-0:{mountpoint:/rootfs major:252 minor:0 fsType: blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType: blockSize:0}]
influxdb_1 | # batch-pending = 5 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [[udp]]
influxdb_1 | ###
influxdb_1 | ### Controls the listeners for InfluxDB line protocol data via UDP.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [[udp]]
influxdb_1 | enabled = false
influxdb_1 | bind-address = ":4444"
influxdb_1 | database = "udpdb"
influxdb_1 | # retention-policy = ""
influxdb_1 |
influxdb_1 | # These next lines control how batching works. You should have this enabled
influxdb_1 | # otherwise you could get dropped metrics or poor performance. Batching
influxdb_1 | # will buffer points in memory if you have many coming in.
influxdb_1 |
influxdb_1 | # batch-size = 1000 # will flush if this many points get buffered
influxdb_1 | # batch-pending = 5 # number of batches that may be pending in memory
influxdb_1 | # batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
influxdb_1 | # read-buffer = 0 # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
influxdb_1 |
influxdb_1 | # set the expected UDP payload size; lower values tend to yield better performance, default is max UDP size 65536
influxdb_1 | # udp-payload-size = 65536
influxdb_1 |
influxdb_1 | ###
influxdb_1 | ### [continuous_queries]
influxdb_1 | ###
influxdb_1 | ### Controls how continuous queries are run within InfluxDB.
influxdb_1 | ###
influxdb_1 |
influxdb_1 | [continuous_queries]
influxdb_1 | log-enabled = true
influxdb_1 | enabled = true
influxdb_1 | # run-interval = "1s" # interval for how often continuous queries will be checked if they need to run=> Starting InfluxDB ...
influxdb_1 | => About to create the following database: cadvisor
influxdb_1 | => Database had been created before, skipping ...
influxdb_1 | exec influxd -config=${CONFIG_FILE}
influxdb_1 |
influxdb_1 | 8888888 .d888 888 8888888b. 888888b.
influxdb_1 | 888 d88P" 888 888 "Y88b 888 "88b
influxdb_1 | 888 888 888 888 888 888 .88P
influxdb_1 | 888 88888b. 888888 888 888 888 888 888 888 888 8888888K.
influxdb_1 | 888 888 "88b 888 888 888 888 Y8bd8P' 888 888 888 "Y88b
influxdb_1 | 888 888 888 888 888 888 888 X88K 888 888 888 888
influxdb_1 | 888 888 888 888 888 Y88b 888 .d8""8b. 888 .d88P 888 d88P
influxdb_1 | 8888888 888 888 888 888 "Y88888 888 888 8888888P" 8888888P"
influxdb_1 |
cadvisor_1 | I1025 17:30:59.228909 1 machine.go:50] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
cadvisor_1 | I1025 17:30:59.229080 1 manager.go:166] Machine: {NumCores:2 CpuFrequency:2592000 MemoryCapacity:1569599488 MachineID: SystemUUID:B63CB367-870F-4E48-917F-7E524C2C67A0 BootID:e225b37a-b8e6-466b-9f67-84b74df8e90c Filesystems:[{Device:/dev/dm-0 Capacity:41092214784} {Device:/dev/sda1 Capacity:246755328}] DiskMap:map[252:0:{Name:dm-0 Major:252 Minor:0 Size:41884319744 Scheduler:none} 252:1:{Name:dm-1 Major:252 Minor:1 Size:805306368 Scheduler:none} 8:0:{Name:sda Major:8 Minor:0 Size:42949672960 Scheduler:deadline}] NetworkDevices:[{Name:br-067c518abd1f MacAddress:02:42:78:41:c0:71 Speed:0 Mtu:1500} {Name:br-1c136984ac6d MacAddress:02:42:0c:dc:89:ac Speed:0 Mtu:1500} {Name:br-9b7560132352 MacAddress:02:42:5c:df:9a:43 Speed:0 Mtu:1500} {Name:eth0 MacAddress:08:00:27:c7:ba:b5 Speed:1000 Mtu:1500} {Name:eth1 MacAddress:08:00:27:51:9c:7e Speed:1000 Mtu:1500}] Topology:[{Id:0 Memory:1569599488 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown}
cadvisor_1 | I1025 17:30:59.229884 1 manager.go:172] Version: {KernelVersion:4.2.0-42-generic ContainerOsVersion:Alpine Linux v3.2 DockerVersion:1.12.1 CadvisorVersion:0.20.5 CadvisorRevision:9aa348f}
influxdb_1 | [run] 2016/10/25 17:30:58 InfluxDB starting, version 0.13.0, branch 0.13, commit e57fb88a051ee40fd9277094345fbd47bb4783ce
influxdb_1 | [run] 2016/10/25 17:30:58 Go version go1.6.2, GOMAXPROCS set to 2
influxdb_1 | [run] 2016/10/25 17:30:58 Using configuration at: /config/config.toml
influxdb_1 | [store] 2016/10/25 17:30:58 Using data dir: /data/data
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL starting with 10485760 segment size
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL writing to /data/wal/_internal/monitor/1
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL starting with 10485760 segment size
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL writing to /data/wal/cadvisor/default/2
influxdb_1 | [filestore] 2016/10/25 17:30:58 /data/data/_internal/monitor/1/000000001-000000001.tsm (#0) opened in 1.243404ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:58 reading file /data/wal/_internal/monitor/1/_00001.wal, size 1777379
influxdb_1 | [filestore] 2016/10/25 17:30:58 /data/data/cadvisor/default/2/000000001-000000001.tsm (#0) opened in 1.725916ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:58 reading file /data/wal/cadvisor/default/2/_00001.wal, size 4130244
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL starting with 10485760 segment size
influxdb_1 | [tsm1wal] 2016/10/25 17:30:58 tsm1 WAL writing to /data/wal/_internal/monitor/3
influxdb_1 | [cacheloader] 2016/10/25 17:30:58 reading file /data/wal/_internal/monitor/3/_00001.wal, size 1097258
cadvisor_1 | E1025 17:30:59.248299 1 manager.go:208] Docker container factory registration failed: docker found, but not using native exec driver.
cadvisor_1 | I1025 17:30:59.262682 1 factory.go:94] Registering Raw factory
cadvisor_1 | I1025 17:30:59.327660 1 manager.go:1000] Started watching for new ooms in manager
cadvisor_1 | W1025 17:30:59.327883 1 manager.go:239] Could not configure a source for OOM detection, disabling OOM events: exec: "journalctl": executable file not found in $PATH
cadvisor_1 | I1025 17:30:59.328250 1 manager.go:252] Starting recovery of all containers
cadvisor_1 | I1025 17:30:59.371456 1 manager.go:257] Recovery completed
cadvisor_1 | I1025 17:30:59.395792 1 cadvisor.go:106] Starting cAdvisor version: 0.20.5-9aa348f on port 8080
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/cadvisor/default/2/_00002.wal, size 2232957
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/_internal/monitor/3/_00002.wal, size 197651
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/_internal/monitor/3/_00003.wal, size 0
influxdb_1 | [shard] 2016/10/25 17:30:59 /data/data/_internal/monitor/3 database index loaded in 1.387775ms
influxdb_1 | [store] 2016/10/25 17:30:59 /data/data/_internal/monitor/3 opened in 865.976354ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/_internal/monitor/1/_00004.wal, size 0
influxdb_1 | [shard] 2016/10/25 17:30:59 /data/data/_internal/monitor/1 database index loaded in 3.29894ms
influxdb_1 | [store] 2016/10/25 17:30:59 /data/data/_internal/monitor/1 opened in 896.765569ms
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/cadvisor/default/2/_00003.wal, size 444696
influxdb_1 | [cacheloader] 2016/10/25 17:30:59 reading file /data/wal/cadvisor/default/2/_00004.wal, size 0
influxdb_1 | [shard] 2016/10/25 17:30:59 /data/data/cadvisor/default/2 database index loaded in 2.465579ms
influxdb_1 | [store] 2016/10/25 17:30:59 /data/data/cadvisor/default/2 opened in 981.523781ms
influxdb_1 | [subscriber] 2016/10/25 17:30:59 opened service
influxdb_1 | [monitor] 2016/10/25 17:30:59 Starting monitor system
influxdb_1 | [monitor] 2016/10/25 17:30:59 'build' registered for diagnostics monitoring
influxdb_1 | [monitor] 2016/10/25 17:30:59 'runtime' registered for diagnostics monitoring
influxdb_1 | [monitor] 2016/10/25 17:30:59 'network' registered for diagnostics monitoring
influxdb_1 | [monitor] 2016/10/25 17:30:59 'system' registered for diagnostics monitoring
influxdb_1 | [cluster] 2016/10/25 17:30:59 Starting cluster service
influxdb_1 | [shard-precreation] 2016/10/25 17:30:59 Starting precreation service with check interval of 10m0s, advance period of 30m0s
influxdb_1 | [snapshot] 2016/10/25 17:30:59 Starting snapshot service
influxdb_1 | [copier] 2016/10/25 17:30:59 Starting copier service
influxdb_1 | [admin] 2016/10/25 17:30:59 Starting admin service
influxdb_1 | [admin] 2016/10/25 17:30:59 Listening on HTTP: [::]:8083
influxdb_1 | [continuous_querier] 2016/10/25 17:30:59 Starting continuous query service
influxdb_1 | [httpd] 2016/10/25 17:30:59 Starting HTTP service
influxdb_1 | [httpd] 2016/10/25 17:30:59 Authentication enabled: false
influxdb_1 | [httpd] 2016/10/25 17:30:59 Listening on HTTP: [::]:8086
influxdb_1 | [retention] 2016/10/25 17:30:59 Starting retention policy enforcement service with check interval of 30m0s
influxdb_1 | [run] 2016/10/25 17:30:59 Listening for signals
influxdb_1 | [monitor] 2016/10/25 17:30:59 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
influxdb_1 | 2016/10/25 17:30:59 Sending anonymous usage statistics to m.influxdb.com

Appendix: Error message ”

Appendix B: ‘Waiting for confirmation of InfluxDB service startup’

After issuing the command

docker-compose up

I had hit a problem described here, that was caused by using a Vagrant synced folder as working directory.

vagrant@openshift-installer /vagrant/Monitoring/docker-monitoring_master $ docker-compose up
Starting dockermonitoringmaster_influxdbData_1
Starting dockermonitoringmaster_influxdb_1
Starting dockermonitoringmaster_cadvisor_1
Starting dockermonitoringmaster_grafana_1
Attaching to dockermonitoringmaster_influxdbData_1, dockermonitoringmaster_influxdb_1, dockermonitoringmaster_grafana_1, dockermonitoringmaster_cadvisor_1
dockermonitoringmaster_influxdbData_1 exited with code 0
influxdb_1      | => Starting InfluxDB in background ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      |
influxdb_1      |  8888888           .d888 888                   8888888b.  888888b.
influxdb_1      |    888            d88P"  888                   888  "Y88b 888  "88b
influxdb_1      |    888            888    888                   888    888 888  .88P
influxdb_1      |    888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
influxdb_1      |    888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
influxdb_1      |    888   888  888 888    888 888  888   X88K   888    888 888    888
influxdb_1      |    888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
influxdb_1      |  8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"
influxdb_1      |
influxdb_1      | 2016/10/28 12:34:49 InfluxDB starting, version 0.9.6.1, branch 0.9.6, commit 6d3a8603cfdaf1a141779ed88b093dcc5c528e5e, built 2015-12-10T23:40:23+0000
influxdb_1      | 2016/10/28 12:34:49 Go version go1.4.2, GOMAXPROCS set to 2
influxdb_1      | 2016/10/28 12:34:49 Using configuration at: /config/config.toml
influxdb_1      | [metastore] 2016/10/28 12:34:49 Using data dir: /data/meta
influxdb_1      | [retention] 2016/10/28 12:34:49 retention policy enforcement terminating
influxdb_1      | [monitor] 2016/10/28 12:34:49 shutting down monitor system
influxdb_1      | [handoff] 2016/10/28 12:34:49 shutting down hh service
influxdb_1      | [subscriber] 2016/10/28 12:34:49 closed service
influxdb_1      | run: open server: open meta store: raft: new bolt store: invalid argument
grafana_1       | 2016/10/28 12:34:50 [I] Starting Grafana
grafana_1       | 2016/10/28 12:34:50 [I] Version: 2.6.0, Commit: v2.6.0, Build date: 2015-12-14 14:18:01 +0000 UTC
grafana_1       | 2016/10/28 12:34:50 [I] Configuration Info
grafana_1       | Config files:
grafana_1       |   [0]: /usr/share/grafana/conf/defaults.ini
grafana_1       |   [1]: /etc/grafana/grafana.ini
grafana_1       | Command lines overrides:
grafana_1       |   [0]: default.paths.data=/var/lib/grafana
grafana_1       |   [1]: default.paths.logs=/var/log/grafana
grafana_1       | Paths:
grafana_1       |   home: /usr/share/grafana
grafana_1       |   data: /var/lib/grafana
grafana_1       |   logs: /var/log/grafana
grafana_1       |
grafana_1       | 2016/10/28 12:34:50 [I] Database: sqlite3
grafana_1       | 2016/10/28 12:34:50 [I] Migrator: Starting DB migration
grafana_1       | 2016/10/28 12:34:50 [I] Listen: http://0.0.0.0:3000
cadvisor_1      | I1028 12:34:50.214917       1 storagedriver.go:42] Using backend storage type "influxdb"
cadvisor_1      | I1028 12:34:50.215243       1 storagedriver.go:44] Caching stats in memory for 2m0s
cadvisor_1      | I1028 12:34:50.215376       1 manager.go:131] cAdvisor running in container: "/docker/2da85f53aaf23024eb2016dc330b05634972252eea2f230831e3676ad3b6fa73"
cadvisor_1      | I1028 12:34:50.238721       1 fs.go:107] Filesystem partitions: map[/dev/dm-0:{mountpoint:/rootfs major:252 minor:0 fsType: blockSize:0} /dev/sda1:{mountpoint:/rootfs/boot major:8 minor:1 fsType: blockSize:0}]
cadvisor_1      | I1028 12:34:50.249690       1 machine.go:50] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
cadvisor_1      | I1028 12:34:50.249806       1 manager.go:166] Machine: {NumCores:2 CpuFrequency:2592000 MemoryCapacity:1569599488 MachineID: SystemUUID:B63CB367-870F-4E48-917F-7E524C2C67A0 BootID:e225b37a-b8e6-466b-9f67-84b74df8e90c Filesystems:[{Device:/dev/dm-0 Capacity:41092214784} {Device:/dev/sda1 Capacity:246755328}] DiskMap:map[252:0:{Name:dm-0 Major:252 Minor:0 Size:41884319744 Scheduler:none} 252:1:{Name:dm-1 Major:252 Minor:1 Size:805306368 Scheduler:none} 8:0:{Name:sda Major:8 Minor:0 Size:42949672960 Scheduler:deadline}] NetworkDevices:[{Name:br-067c518abd1f MacAddress:02:42:78:41:c0:71 Speed:0 Mtu:1500} {Name:br-1c136984ac6d MacAddress:02:42:0c:dc:89:ac Speed:0 Mtu:1500} {Name:br-3b100a8c826a MacAddress:02:42:11:2c:a0:4c Speed:0 Mtu:1500} {Name:br-5573a4076799 MacAddress:02:42:97:14:9a:fc Speed:0 Mtu:1500} {Name:br-9b7560132352 MacAddress:02:42:5c:df:9a:43 Speed:0 Mtu:1500} {Name:eth0 MacAddress:08:00:27:c7:ba:b5 Speed:1000 Mtu:1500} {Name:eth1 MacAddress:08:00:27:51:9c:7e Speed:1000 Mtu:1500}] Topology:[{Id:0 Memory:1569599488 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:6291456 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown}
cadvisor_1      | I1028 12:34:50.251115       1 manager.go:172] Version: {KernelVersion:4.2.0-42-generic ContainerOsVersion:Alpine Linux v3.2 DockerVersion:1.12.1 CadvisorVersion:0.20.5 CadvisorRevision:9aa348f}
cadvisor_1      | E1028 12:34:50.273526       1 manager.go:208] Docker container factory registration failed: docker found, but not using native exec driver.
cadvisor_1      | I1028 12:34:50.279684       1 factory.go:94] Registering Raw factory
cadvisor_1      | I1028 12:34:50.316816       1 manager.go:1000] Started watching for new ooms in manager
cadvisor_1      | W1028 12:34:50.316960       1 manager.go:239] Could not configure a source for OOM detection, disabling OOM events: exec: "journalctl": executable file not found in $PATH
cadvisor_1      | I1028 12:34:50.317927       1 manager.go:252] Starting recovery of all containers
cadvisor_1      | I1028 12:34:50.336674       1 manager.go:257] Recovery completed
cadvisor_1      | I1028 12:34:50.352618       1 cadvisor.go:106] Starting cAdvisor version: 0.20.5-9aa348f on port 8080
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...
influxdb_1      | => Waiting for confirmation of InfluxDB service startup ...

To confirm the issue, you can try to connect to port 8083 in a different window on the docker host:

(docker host) $ curl --retry 10 --retry-delay 5 -v http://localhost:8083
* Rebuilt URL to: http://localhost:8083/
* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 8083 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8083
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

I.e. there is a TCP RST on the port.

Reason:

The reason of this problem lies in a problem with Vagrant synced folders of type “vboxsf”.

Workaround 1a: do not use synced folders

The problem disappears, if you clone the repository into a non-synced folder.

Workaround 1b: use synced folders of different type

The problem disappears, if you use a synced folder of different type. I have tested to use Vagrant synched folder of type “smb”.

  • add the line
     config.vm.synced_folder ".", "/vagrant", type: "smb"

    to the Vagrantfile inside the configure section

  • start CMD as Administrator. E.g. run
     runas.exe /savecred /user:Administrator "cmd"

    in a non-priviledged CMD

  • run
    vagrant up

    in the privideged CMD session.

After that, you can ssh into the docker host system, and clone the repository and run docker-compose up like starting in step 1 without hitting the InfluxDB problem.

Workaround 2: upgrade InfluxDB to 0.13

The problem disappears even, if we use vboxfs synced folders, if we update InfluxDN to 0.13:

I have found this InfluxDB 0.9 issue with the same symptoms. This is, why I have tried to upgrade, still working within the Vagrant synced folder /vagrant.

Step 1: Upgrade InfluxDB

In the docker-compose.yml file replace

influxdb:
 image: tutum/influxdb:0.9

by

influxdb:
 image: tutum/influxdb:0.13
Step 2: remove ./data folder (important! Otherwise, the problem will persist!)
Step 3: Try again:

$ docker-compose up

Starting dockermonitoringrepaired_influxdbData_1
Starting dockermonitoringrepaired_influxdb_1
Starting dockermonitoringrepaired_grafana_1
Starting dockermonitoringrepaired_cadvisor_1
...
influxdb_1 | [monitor] 2016/10/25 16:48:44 Storing statistics in database '_internal' retention policy 'monitor', at interval 10s
influxdb_1 | 2016/10/25 16:48:44 Sending anonymous usage statistics to m.influxdb.com
influxdb_1 | [run] 2016/10/25 16:48:44 Listening for signals
Step 4: CURL-Test

Now, the curl test is successful:

$ curl --retry 10 --retry-delay 5 -v http://localhost:8083
* Rebuilt URL to: http://localhost:8083/
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 8083 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
...
</body>

</html>
* Connection #0 to host localhost left intact

Also this is successful now.

Appendix: Error: load error nokogiri/nokogiri LoadError Vagrant

This is an error I have encountered after installing Vagrant 1.8.1 on Windows 10 and …

 

0

Resolving Networking Problems (Performance Problems) of a WD My Cloud NAS System

Ever since I had bought a Western Digital My Cloud System with 4 TB of Backup space, I had problems with it: after some hours, the system was unreachable over the network. Several firmware upgrades later the problem has aggravated, and the system was reachable only for 10 to 20 minutes after each power cycle.

In this  little blog, I summarize the measures I have taken to get it back to life.

Symptoms

A deeper analysis with Wireshark software has shown that the system is still up and running, but it was sending TCP RST back, when trying to connect to the Web Interface. Also backing up data was not possible anymore. Removing and re-attaching the power cable has helped, but no longer then 10 to 20 minutes. Then I have seen the same symptoms again.

Searching for a working solution via Google was in vain, so I thought, I have to throw away the system and buy something better. However, I have given it a last chance and I have done the following:

Step 1: Remove and re-attach the power cable

Step 2: Connect to the Web interface (is possible ~10 to 20 minutes after powering up).

Step 3: Enable SSH root access

Step 4: Connect via SSH, and change the default password to your individual password

Root Cause:

After 10 or 20 minutes, I have seen the same symptoms again: the web interface is sending Reset messages. However, the SSH connection was still up and running. This has helped me to figure out the problem.

top (hitting M will sort the entries by Memory) has shown, that the system was overwhelmed with respect to memory: before I have “repaired” my system, the total memory used was around 100% of the physical memory (only 256 MB!) and another 50-70% of the Swap (500 MB). The heavy swapping has caused the system being unresponsive. The hard disks were continuously working quite audible.

CPU was about Okay, so this was not the problem.

Output of top, sorted by Memory (after measures taken)

I could see that apache2 was running 10 times and each of those have occupied 8% of the memory, summing up to 80% of the available memory. As you can see above, switching off 7 of those 10 apache2 process were the key to success; a solution I have not found anywhere in the forums discussing the exact same unreachability symptoms with WD My Cloud. However, I have found this StackOverflow question asking, why there are so many apache processes running. The answer: “Check your httpd.conf for MinSpareServers, MaxSpareServers and ServerLimit”.

Still, it was not so easy to find the location, where those variables are defined on WD My Cloud. For WD My Cloud, it is not defined on /etc/apache2/apache2.conf as seems to be the case for most other Apache2 servers. A search has revealed the location:

WDMyCloud:~# grep -iR MaxSpareServers /etc 2>/dev/null
/etc/apache2/mods-available/mpm_prefork.conf:# MaxSpareServers: maximum number of server processeshich are kept spare
/etc/apache2/mods-available/mpm_prefork.conf:    MaxSpareServers       10
/etc/apache2/mods-enabled/mpm_prefork.conf:# MaxSpareServers: maximum number of server processes wch are kept spare
/etc/apache2/mods-enabled/mpm_prefork.conf:    MaxSpareServers       10

And here, we find a confirmation: on WD My Cloud, I need to edit the values on /etc/apache2/mods-available/mpm_prefork.conf

This is, where I have changed the value from 10 to 3:

Resolution of the Problem

Step 5: Reduce the number of Apache2 Processes

As root, edit the file /etc/apache2/mods-available/mpm_prefork.conf.

Before the change, I have found following relevant content in the file:

    StartServers          2
    MinSpareServers       2
    MaxSpareServers       10
    MaxRequestWorkers     10
    MaxConnectionsPerChild  10000

I have changed this to:

    StartServers          2
    MinSpareServers       2
    MaxSpareServers       3
    MaxRequestWorkers     3
    ServerLimit           3
    MaxConnectionsPerChild  10000

I have read somewhere, that the ServerLimit should have the same value as MaxSpareServers, so I have added it to the configuration (I have not tested without this entry, so it may be that it has no effect).

Step 6: restart apache

Restart Apache Server with

service apache2 restart

Check the memory consumption e.g. with top. It should have improved by now.

Step 7 (optional): Switch off additional services

You can switch off some of additional services you do not need for your WD My Cloud. In my case, I have chosedn a quick and dirty way of doing this by renaming following files:

mv /etc/init.d/twonky /etc/init.d/twonky.orig
mv /etc/init.d/wdmcserverd /etc/init.d/wdmcserverd.orig
mv /etc/init.d/wdnotifierd /etc/init.d/wdnotifierd.orig
mv /etc/init.d/wdphotodbmergerd /etc/init.d/wdphotodbmergerd.orig

However, this is not needed, I think, and furthermore, it leads to error messages on the Windows backup client that a backup destionation was not found. Anyway, I have kept it this way, because the backup works fine, even though this error message pops up from time to time…

Step 8 (needed only, if step 7 was performed before):

reboot

DONE!

Result

Since then, the system is working silently (no more audible disk swapping) and reliably. It is up and running for more than 11 days now; continuous backup as well as the Web Interface is working fine, the the web portal’s responsiveness is much higher than ever before.

I hope this will help someone out there as well.

P.S.: I have included this information also in this forum post. The other forum posts discussing similar issues (e.g. here, here and here) were closed, so I could not add the information there.

3

Docker Java Performance Tests

In this blog we will show the surprising result that Docker on Ubuntu seems to outperform Docker on CoreOS by ~30%, when tested with a java templating web service.

But first let us discuss the reason on why I have started the tests in the first place: on my blog Docker Web Performance Tests we had found the surprising result that a Rails Web Application on Docker on a CoreOS VM has about 50% better performance, than Rails directly on Windows hardware. Will we find the same result for java?

java on Windows Hardware vs. Ubuntu VirtualBox VM (beware: apples vs. bananas!)

The answer is no: the chosen java application, an Apache Camel templating web service has a factor ~4 better performance on Windows Hardware than on an Ubuntu VirtualBox machine (beware: apples vs. bananas!).

Now comparing apples with apples (all on VirtualBox VMs):

java on Ubuntu vs. Docker/Ubuntu vs. Docker/CoreOS vs. Docker/boot2docker (apples vs. apples)

The java performance on a the chosen Vagrant deploy-able Ubuntu image has almost the same performance on Docker as on native Ubuntu (only 5% performance degradation). With that good result, the Ubuntu Docker alternative has outperformed the other Docker alternatives by more than 30% and has impressed with java startup times that were more than 3 times as fast as on CoreOS.

Note: At the time of writing the blog post, this dockerization vs. virtualization vs. native comparison paper was unknown to me: it follows a more comprehensive scientific approach by testing many different performance aspects separately. I like it a lot: check it out!

This is only a small proof of concept (POC) of a hobby developer and the POC can only give hints about which road to take. Even though now the more scientific paper is available, I am still fine with having invested the time of testing and publishing, since my test results are focused on the applications I develop (Rails and java Apache Camel) and the infrastructure alternatives I have on my laptop: native Windows vs. Ubuntu VirtualBox VM vs. CoreOS VirtualBox VM vs. boot2docker Virtualbox VM. The scientific paper was only testingUbuntu 13.10 and no CoreOS I was interested in.

Document Versions

v1 original post
v2: updated CoreOS Vagrant image from v713.3.0 to v766.4.0 with no performance improvement
v3: giving CoreOS a last chance: updated to alpha v845.0.0 with no performance improvement
v4: changed introduction and re-organized Tldr; and added a link to a more scientific paper.

Tldr;

When comparing the java performance on a native Ubuntu VM with several Docker host VM alternatives (UbuntuCoreOS and boot2docker), we can see that the chosen Vagrant deploy-able Ubuntu image has almost the same performance on Docker as on native Ubuntu. With that, the Ubuntu Docker alternative has outperformed the other Docker alternatives by more than 30% and has impressed with java startup times that were more than 3 times as fast as the one of the second best candidate: CoreOS.

As expected, java performance has shown to be substantially (~4 times) higher on Windows Hardware than on VirtualBox Ubuntu VM with same number of cores. Note that this last comparison is like comparing apples with bananas: Windows on Hardware vs. Ubuntu Linux on VirtualBox virtual machine with less DRAM.

Test Scenarios

In the current blog, we perform performance measurements of a java Apache Camel application for following scenarios:

  1. application run natively on Windows 7 SP1 on the host system (8 GB DRAM, with ~2.2 GB free before start of the application; 2 core CPU i5-2520M@2.5 GHz)
  2. application run within a VirtualBox Ubuntu 14.04 LTS (64 bit) VM with 1.5 GB DRAM and 1 vCPU on the same Windows host system
  3. application run within a docker container on the above Ubuntu VM (vagrant box version 1.8.1)
  4. application run within a docker container on a CoreOS (Vagrant CoreOS stable 717.3.0 and v766.4.0) Virtualbox VM with same resources as the Ubuntu VM (1.5 GB DRAM and 1 vCPU) run on the Windows system
  5. application run within a docker container on a boot2docker Virtualbox VM with same resources as in the old blog, i.e. (2 GB DRAM and 2 vCPU), run on the same Windows host.
    Note that boot2docker is deprecated by now, but since we have performed the Rails Docker Web Performance Tests on boot2docker, we keep it as a reference.

Considering the results we have seen with the Rails web application, we expect that 3. and 4. have higher performance than 1.

With 2. and 3. we can easily extract the difference that comes from the additional docker layer. For the option 4., we expect the same or a higher performance than in 3, since CoreOS is optimized for running Docker containers.

As in the other performance test blog, we use Apache Bench as the measurement tool.

Test Results

Java Apache Camel Startup Time

  1. application run natively on Windows on the host system
    1. NEED TO TEST AGAIN WITH STDOUT REDIRECTION INTO FILE!!!
      2 cores: Apache Camel 2.12.2 (CamelContext: camel-1) started in 3.5 +-0.6 seconds
  2. application run within a VirtualBox Ubuntu VM with 1.5 GB DRAM and 1 vCPU
    1. 1 vCPU: Apache Camel 2.12.2 (CamelContext: camel-1) started in 5.0 seconds
    2. 2 vCPU: Apache Camel 2.12.2 (CamelContext: camel-1) started in 3.7 seconds
  3. application run within a docker container on the above Ubuntu VM
    1. 1 vCPU: Apache Camel 2.12.2 (CamelContext: camel-1) started in 5.2 seconds
    2. 2 vCPU: Apache Camel 2.12.2 (CamelContext: camel-1) started in 4.0 seconds
  4. application run within a docker container on a CoreOS VM with same resources as the Ubuntu VM
    1. 1 vCPU, v713.3.0: Apache Camel 2.12.2 (CamelContext: camel-1) started in 15.2 seconds
    2. 2 vCPU
      1. v713.3.0: Apache Camel 2.12.2 (CamelContext: camel-1) started in 13.4 seconds
      2. v766.4.0: Apache Camel 2.12.2 (CamelContext: camel-1) started in 14.5 +- 1.5 seconds
      3. v845.0.0: Apache Camel 2.12.2 (CamelContext: camel-1) started in [15.0, 13.2, 19.2,13.2] seconds (4 tests)
  5. application run within a docker container on a boot2docker VM with same resources as in the old blog, i.e. (2 GB DRAM and 2 vCPU)
    1. 2 vCPU: Apache Camel 2.12.2 (CamelContext: camel-1) started in 24.1 +- 0,5 seconds (4 Tests)

Windows: has the quickest Apache Camel startup (95 routes) with ~3.5 sec +- 0.6 sec.

Ubuntu with or without Docker: with 2 vCPUs, the startup is slightly slower. Only with 1 vCPU, the startup takes ~40% more time on Ubuntu (both, native and in Docker).

CoreOS: Apache Camel has a 3.8 and 3,4 times longer (!) startup time on CoreOS Docker than on Windows and on Ubuntu, respectively. CoreOS is more lightweight than the Ubuntu image, and is supposed to be optimized for Docker. Therefore, this is a surprising result.

boot2docker: has the worst Apache Camel bootup time: it takes almost a 7 times longer (!!) than the same process on Windows. boot2docker is deprecated by now. This performance results is another reason not to use it anymore.

Java Apache Camel Throughput

Test script:

# test:
./ab -n 10000 -c 100 "http://<IP>:<Port>/ProvisioningEngine?action=Add%20Customer&CustomerName=ttt&offlineMode=offlineMode"

Shortly before we run the actual test, we train the application by issuing the same command with -n 100 and -c 100.

  1. application run natively on Windows
    1. 2 i5-2520M CPU cores: Requests per second:    46.3 [#/sec] (mean)
  2. application run within a VirtualBox Ubuntu VM with 1.5 GB DRAM and 1 vCPU (on an i5 host)
    1. 1 vCPU: Requests per second:    14.3 [#/sec] (mean)
    2. 2 vCPU: Requests per second:    17.9 [#/sec] (mean) (i.e. ~25% higher than 1 vCPU)
  3. application run within a docker container on the above Ubuntu VM
    1. 1 vCPU: Requests per second:    12.5 [#/sec] (mean)
    2. 2 vCPU: Requests per second:    17.0 [#/sec] (mean) (i.e. ~35% higher than 1 vCPU)
  4. application run within a docker container on a CoreOS VM with same resources as the Ubuntu VM (1.5 GB DRAM and 1 or 2 vCPU)
    1. 1 vCPU, v713.3.0:: Requests per second:    8.82 [#/sec] (mean)
    2. 2 vCPU
      1. v713.3.0: Requests per second:    13.0 [#/sec] (mean) (i.e. ~48% higher than 1 vCPU)
      2. v766.4.0: Requests per second:    11.8 [#/sec] (mean)
      3. v845.0.0: Requests per second:    13.3 [#/sec] (mean)
  5. application run within a docker container on a boot2docker VM with same resources as in the old blog, i.e. (2 GB DRAM and 2 vCPU)
    1. 1 vCPU: not tested
    2. 2 vCPU: We get the error “Test aborted after 10 failures”, a memory allocation failure, even though we have 2 GB DRAM instead of 1.5 GB!

Note that each requests is creating tons of output on the terminal (deep tracing). However, in call cases, I have redirected the stream into a log file (using the “>” operator).

Note also that the comparison between Windows and Linux variants is not fair, since Windows was tested on hardware with 2 real CPU cores, while the Linux variants was tested on Virtualbox VMs with one or two vCPUs (Ubuntu, Docker on Ubuntu and CoreOS) or 2 vCPUs (boot2docker). However, considering you have a Windows notebook and you want to decide, whether to perform your development and tests on native Windows vs. Linux VM vs. on Docker, this is what you get:

  • Windows: on the host system with 2 real CPU cores is ~4 times higher throughput performance than any of the Linux VM variants on 1 vCPU (comparing apples with bananas, but still relevant for my decision)
  • Ubuntu vs Docker on Ubuntu: Native Linux has a ~5 to 15% higher throughput performance than Docker on Linux
  • CoreOS: Interestingly, the Ubuntu Docker VM image has a ~40% higher throughput performance than the optimized CoreOS image. Because of this negative result for CoreOS, I have updated the v713.3.0 image to the latest versions available: stable-v766.4.0 and alpha-v845.0.0. This had no substantial impact on the (still bad) performance.
  • boot2docker: The boot2docker image has memory allocation problems. Since boot2docker is deprecated, I did not investigate this further.

Summary

We have compared the performance of a java Apache Camel application on following platforms: on a Windows 7 laptop, on an Ubuntu Virtualbox image and in a Docker container, where Docker was tested on Ubuntu, CoreOS and the official, but deprecated boot2docker image.

Note that Windows has been tested on hardware, while the Linux variants have been tested on Virtualbox VMs. This is not a fair test between Windows and Linux, but it still helps me to decide, whether I better keep performing my java development directly on my Windows laptop, or whether I can move all java development to a Virtualbox Linux VM on my Windows laptop. For Rails, I had found the surprising result, that the performance was better on the VMs than directly on the hardware. Not for java, though:

We have seen that the performance on Windows 7 without virtualization is ~4 times higher than on any of the Virtualbox VMs, which will account for the expected performance degradation effect of software virtualization. This is no comparison between java on Windows and java on Ubuntu. It is just a comparison of the options I have on a Windows laptop: directly work on Windows or work on the VMs. The CoreOS image has a ~30% lower performance than the Ubuntu image, which is surprising, since it is much more lightweight than the Ubuntu image.

The application’s startup times are quite good for Ubuntu and for Docker on Ubuntu. Surprisingly, both, CoreOS as well as boot2docker fall back substantially with respect to startup times (factor 3 and 6 compared to Ubuntu).

The deprecated boot2docker image has major memory allocation problems and should not be used.

Recommendation

All in all, development is best to be performed directly on the Windows laptop, with Ubuntu or Docker on Ubuntu being a good alternative because of low application startup times. CoreOS does not seem to be a good alternative, since in the development phase, I often need to restart the application, which takes ~13 sec on CoreOS instead of ~3.5 to 4 sec on Windows hardware or on the Ubuntu VM’s Docker.

Performance tests are best to be done on Windows hardware (or on HW-virtualized VMs, which I have not performance-tested yet). However, because of the deployment advantages, I would like to deliver my application as a Docker image, so I need to perform tests on Docker as well. For those Docker tests, the Ubuntu-trusty64-docker Virtualbox image of William Yeh has shown the best performance results.

If I need to test the behavior of the application on clustered Docker hosts, CoreOS and Kubernetes are the only cluster alternatives I have experience with, and I have not done more than a small installation POC in case of Kubernetes.  However, considering the low CoreOS performance, I will need to investigate its alternatives, I guess: e.g. Docker Swarm, Kubernetes, or others (ping me, if you have suggestions).

My Path towards Continuous Integration of my Java Application

I am planning to do the following:

  • I will continue to develop my java application on native Windows and I will continue to push the code to Github often.
  • I have linked Github with TravisCI, so the code is automatically tested in the cloud. This way, I do not need to wait for test results. The test results are sent via email and I can react soon for any newly introduced bugs.
  • I will link TravisCI with Docker Hub, so a Docker image will be created automatically.
  • Locally, I will use the Docker on Ubuntu, if I need to troubleshoot the Docker build process, or I need to manually test, whether the Docker image really works.
  • If it comes to productive deployment including Docker host clustering, I had made good experience with CoreOS, even though clustering behind a HTTP proxy is a challenge; see here. However, the performance is sub-optimal, and I need to evaluate, which alternatives I have (e.g. test new CoreOS versions? Docker Swarm? Google Kubernetes?).

Watch out for my next blog post(s)…

;-))

3

Docker Performance Tests for Ruby on Rails

Abstract

In a series of performance tests, we could show that a (Rails) web application has 50% higher performance on docker (installed on Windows/boot2docker) than the same application run natively on Windows, no matter whether or not the database and program code is located on a shared volume. However, if such a shared volume is auto-mounted to one of the Windows C:\Users folders, the performance drops by a factor of almost ten.

Preface

This is the 5th blog of my Dummy’s Diary on Docker (= DDDocker), which originally was hosted on LinkedIn. See the DDDocker table of contents to find more.

Introduction

What is the performance of a dockerized application compared to the same application running natively on Windows or Linux or on a Windows or Linux VM? This is, what I have asked myself when starting with Docker. However, my Internet search on that subject has not given me the desired results. I cannot believe that there are no such articles in the Internet, and if you happen to find some, please point them to me. In any case, I have decided to perform some quick&dirty tests myself. I call them “quick&dirty”, because I did not care to separate the measurement tool from the server: everything is performed on my Windows Laptop. Also, I did not care to document the exact test setup (Rails version, Laptop data sheet, etc.). For now, I have deemed it sufficient to get an idea on the trend, rather more than the absolute numbers.

I have received some unexpected results…

Test Scenarios

In DDDocker (4): Persistence via shared Volumes (sorry, this is a LinkedIn blog, which has a limited lifetime; tell me, if you need access) we have experienced a major decrease of performance of a web server, when the server’s code and data base is located on a shared volume on a Windows host.

In the current blog, we perform performance measurements of a Rails web server for following scenarios:

  1. web server run natively on Windows
  2. web server run within a docker container with all source code and database in the container. The docker container runs on a VirtualBox VM with 2 GB DRAM (boot2docker image).
  3. web server run within a docker container with all source code and and database in a shared volume. The docker container runs on a VirtualBox VM with 2 GB DRAM (boot2docker VM). The shared volume is located on the Linux VM (docker host, i.e. the boot2docker VM).
  4. web server run within a docker container with all source code and and database in a shared volume. The docker container runs on a VirtualBox VM with 2 GB DRAM (boot2docker image). The shared volume is located on the C: partition on Windows and is mapped to the VM and from there it is mapped to the docker container.

The quick and dirty tests are performed using Apache Bench tool “ab.exe” on my Windows Laptop, which is also hosting the Web Server and/or the boot2docker host. This is not a scientific setup, but we still will get meaningful results, since we are interested in relative performance numbers only.

1. Windows Scenario

In this scenario the rails application runs natively on Windows. All Ruby on Rails code as well as the local database reside on a local Windows folder.

2. Docker on Linux VM Scenario

In this scenario, Docker is installed on Windows like specified on the Docker documentation. A Linux VM is running on Windows and hosts the container. All Ruby on Rails code as well as the local database reside on a folder within the container.

3. Docker on Linux VM Scenario with shared Volume

The difference between this scenario and scenario 2 is that all Ruby on Rails code as well as the local database reside on a folder in the Linux VM and are mounted as so-called virtual volume into the container.

4. Docker on Linux VM Scenario with shared Volume mounted on Windows

The difference between this scenario and scenario 3 is that all Ruby on Rails code as well as the local database reside on a Windows folder on C:\Users\. The rails executable within the container is accessing a local folder which maps to the Linux’s hosts folder via the shared volume feature, which in turn maps to the auto-mounted Windows folder on C:\Users\<username>\<foldername>. Because of the extra-hops, the lowest performance is expected here. We will see below that this performance impact for this scenario is dramatic.

Test Environment / Equipment

All tests will be run on a Windows 7 Dell i5 notebook with 8 GB of RAM. During the native Windows test, the docker VM will run in parallel, so the CPU and RAM resources available for the app are similar in all 4 tests.

As the performance test tool, Apache Bench (ab.exe) is used, which ships with XAMPP installer (xampp-win32-5.6.8-0-VC11-installer.exe).

We perform 1000 read requests, with a concurrency factor 100. Each request will cause a SQL query and returns ~245 kB text.

For the tests, I have intentionally chosen an operation that is quite slow: a GET command that searches and reads 240 kB text from an SQLite database on an (already overloaded) laptop.

The total performance numbers are not meaningful for other web applications, different databases etc. However, the relative numbers are meaningful for the purpose to compare the different setups.

Results

From the overhead point of view, I would have expected a performance relation like follows:

Expectation:

  1. native Windows performance is greater than
  2. Docker performance on Linux VM on Windows is greater than 
  3. Docker performance on Linux VM on Windows with shared volume is greater than
  4. Docker performance on Linux VM on Windows with shared volume auto-mounted on Windows

However, I have found the surprising results that native Windows performance ~33% lower than Docker performance:

Finding 1: Rails on docker outperforms native Rails on Windows by 50%, even if the Docker Linux host is a VM, which is hosted on Windows.

In addition, we have seen that:

Finding 2: the performance of a dockerized (Rails) web application is not degraded by locating the database and (Rails) program code on a shared volume.

The next finding was already expected from our experience in the last blog:

Finding 3: the performance of a dockerized (Rails) web application drops dramatically (by factor of ~10), if the shared volume is auto-mounted on a C:\Users folder.

The last statement was expected from the DDDocker (4) blog. Here the results in detail:

This graph shows the number of GET requests per second the web portal can handle. Docker outperforms Windows by ~50%, no matter whether docker is run with local folders or with shared volumes pointing to the Linux host. The performance drops by the factor of ~9, if the shared folder is used to access an auto-mounted Windows folder.

The same picture is seen with respect to the throughput, since the transfer rate is proportional to the requests/sec rate:

A corresponding inverse picture is seen with respect to the average time an end user has to wait for each request: because of the overload situation with 100 parallel requests, each requests takes 5.88, 3.89 and 34.59 (!) seconds for the application running on Windows, on a docker container within the VirtualBox VM and on a docker image again, but with all data located on a shared Volume.

Since the shared volume is located on Windows and is auto-mounted to the VM, and then is mounted to the container, there is no surprise that the performance is lower. A response time increase of a factor 8.9 is still an unexpected high factor you have to take into consideration when working with shared volumes on Windows machines.

Appendix

Preparation of Scenario 3

The preparation of scenarios 1, 2 and 4 have been described in the last blog already. Here is a description on how I have built scenario 3:

We need to copy the folder from Windows into the Linux VM. This way the performance test would also have relevance for docker users that are running docker on Linux. Let us see:

(Linux VM)# sudo cp -R /c/Users/vo062111/dockersharedfolder2 /home/rails/

will copy the auto-mounted volume to /home/rails/dockersharedfolder2 inside the VM. Now let us start a container, which maps to this folder instead of the /c/Users/vo062111/dockersharedfolder2, which is volume that is mounted from Windows.

docker@boot2docker:~$ JOB=$(docker run -d -p 8080:3000 -v /home/rails/dockersharedfolder2:/home/rails/ProvisioningEngine oveits/rails_provisioningengine:latest /bin/bash -c "cd /home/rails/ProvisioningEngine; rails s")

With that we get rid of the auto-mount-hop, when we compare it to scenaro 4.

Scenario 1: Test Protocol Rails Server on native Windows

bash-3.1$ ./ab -n 1000 -c 100 "http://127.0.0.1:3000/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 127.0.0.1
 Server Port: 300
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 58.774 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 17.01 [#/sec] (mean)
 Time per request: 5877.425 [ms] (mean)
 Time per request: 58.774 [ms] (mean, across all concurrent requests)
 Transfer rate: 4077.01 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 0 0 0.3 1 4
 Processing: 84 5630 1067.3 5723 12428
 Waiting: 79 5623 1067.7 5718 12424
 Total: 85 5630 1067.3 5723 12428
 ERROR: The median and mean for the initial connection time are more than twice the standard
 deviation apart. These results are NOT reliable.
Percentage of the requests served within a certain time (ms)
50% 5723
66% 6072
75% 6156
80% 6205
90% 6418
95% 6496
98% 6549
99% 6566
100% 12428 (longest request)
 bash-3.1$ ./ab -n 1000 -c 100 "http://127.0.0.1:3000/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 127.0.0.1
 Server Port: 3000
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 58.761 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 17.02 [#/sec] (mean)
 Time per request: 5876.075 [ms] (mean)
 Time per request: 58.761 [ms] (mean, across all concurrent requests)
 Transfer rate: 4077.94 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 0 0 0.3 1 4
 Processing: 55 5590 1117.4 5779 11605
 Waiting: 51 5584 1117.5 5773 11601
 Total: 56 5590 1117.5 5780 11605
 ERROR: The median and mean for the initial connection time are more than twice the standard
 deviation apart. These results are NOT reliable.
Percentage of the requests served within a certain time (ms)
 50% 5780
 66% 5929
 75% 5999
 80% 6052
 90% 6173
 95% 6276
 98% 6411
 99% 6443
 100% 11605 (longest request)
 bash-3.1$

Server Logs

...
Started GET "/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config" for 127.0.0.1 at 2015-07-09 08:57:43 +0200
 Processing by TextDocumentsController#index as TEXT
 Parameters: {"filter"=>{"systemType"=>"OSV_V7R1", "action"=>"Add Customer", "templateType"=>"config"}}
 TextDocument Load (0.5ms) SELECT "text_documents".* FROM "text_documents"
 Rendered text_documents/index.text.erb (0.5ms)
 Completed 200 OK in 18ms (Views: 15.0ms | ActiveRecord: 0.5ms)
...

Summary

  • 15.4 requests/sec and 6.5 sec/requests at 100 concurrent requests and a total of 1000 requests with ~245 kB answer per request.

Open Issue:

  • ab.exe tells us that the responses are non-2xx, but the server log shows 200 OK.

2) Test Protocol Rails Server on Docker

bash-3.1$ ./ab -n 1000 -c 100 "http://192.168.56.101:8080/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.101 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 192.168.56.101
 Server Port: 8080
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 32.550 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 30.72 [#/sec] (mean)
 Time per request: 3254.951 [ms] (mean)
 Time per request: 32.550 [ms] (mean, across all concurrent requests)
 Transfer rate: 7361.80 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 1 4 3.1 3 28
 Processing: 66 3103 697.2 3186 6725
 Waiting: 22 3049 694.9 3135 6665
 Total: 73 3107 697.3 3190 6728
Percentage of the requests served within a certain time (ms)
 50% 3190
 66% 3245
 75% 3297
 80% 3349
 90% 3693
 95% 3783
 98% 3888
 99% 6004
 100% 6728 (longest request)
 bash-3.1$

 bash-3.1$ ./ab -n 1000 -c 100 "http://192.168.56.101:8080/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.101 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 192.168.56.101
 Server Port: 8080
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 45.330 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 22.06 [#/sec] (mean)
 Time per request: 4533.006 [ms] (mean)
 Time per request: 45.330 [ms] (mean, across all concurrent requests)
 Transfer rate: 5286.18 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 1 6 8.0 4 127
 Processing: 76 4290 1588.3 3969 14826
 Waiting: 29 4210 1587.7 3901 14647
 Total: 84 4296 1589.0 3976 14846
Percentage of the requests served within a certain time (ms)
 50% 3976
 66% 4130
 75% 4530
 80% 5083
 90% 5699
 95% 7520
 98% 9635
 99% 10546
 100% 14846 (longest request)
 bash-3.1$

Summary

  • 26 requests/sec and 3,9 sec/requests at 100 concurrent requests and a total of 1000 requests with ~245 kB answer per request.

Open Issue:

  • ab.exe tells us that the responses are non-2xx, but the server log shows 200 OK.

3) Test Protocol Rails Server on Docker with Source Code and DB on a shared Volume residing on the Linux VM

Docker command:

docker@boot2docker:~$ JOB=$(docker run -d -p 8080:3000 -v /c/Users/vo062111/dockersharedfolder2:/home/rails/ProvisioningEngine oveits/rails_provisioningengine:latest /bin/bash -c "cd /home/rails/ProvisioningEngine; rails s")

ab.exe log Test 1

bash-3.1$ ./ab -n 1000 -c 100 "http://192.168.56.101:8080/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.101 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 192.168.56.101
 Server Port: 8080
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 34.173 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 29.26 [#/sec] (mean)
 Time per request: 3417.283 [ms] (mean)
 Time per request: 34.173 [ms] (mean, across all concurrent requests)
 Transfer rate: 7012.09 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 1 4 4.4 3 65
 Processing: 103 3232 726.1 3303 7013
 Waiting: 58 3164 725.8 3235 6900
 Total: 107 3236 726.1 3308 7015
Percentage of the requests served within a certain time (ms)
 50% 3308
 66% 3448
 75% 3552
 80% 3619
 90% 3768
 95% 3948
 98% 4148
 99% 4346
 100% 7015 (longest request)

ab.exe log Test 2

bash-3.1$ ./ab -n 1000 -c 100 "http://192.168.56.101:8080/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.101 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 192.168.56.101
 Server Port: 8080
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 36.615 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 27.31 [#/sec] (mean)
 Time per request: 3661.482 [ms] (mean)
 Time per request: 36.615 [ms] (mean, across all concurrent requests)
 Transfer rate: 6544.43 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 1 4 4.1 3 30
 Processing: 83 3458 1077.6 3531 8051
 Waiting: 38 3378 1070.1 3452 7886
 Total: 84 3462 1077.7 3532 8055
Percentage of the requests served within a certain time (ms)
 50% 3532
 66% 3776
 75% 3895
 80% 3985
 90% 4430
 95% 4680
 98% 6623
 99% 7259
 100% 8055 (longest request)
 bash-3.1$

4) Test Protocol Rails Server on Docker with Source Code and DB on a shared Volume residing on Windows

Docker command:

docker@boot2docker:~$ JOB=$(docker run -d -p 8080:3000 -v /c/Users/vo062111/dockersharedfolder2:/home/rails/ProvisioningEngine oveits/rails_provisioningengine:latest /bin/bash -c "cd /home/rails/ProvisioningEngine; rails s")

ab.exe log Test 1

bash-3.1$ ./ab -n 1000 -c 100 "http://192.168.56.101:8080/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.101 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 192.168.56.101
 Server Port: 8080
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 348.463 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 2.87 [#/sec] (mean)
 Time per request: 34846.268 [ms] (mean)
 Time per request: 348.463 [ms] (mean, across all concurrent requests)
 Transfer rate: 687.66 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 1 3 4.8 2 112
 Processing: 598 33224 6430.7 34965 57297
 Waiting: 468 33076 6427.7 34808 57177
 Total: 600 33227 6430.6 34966 57299
Percentage of the requests served within a certain time (ms)
 50% 34966
 66% 35521
 75% 35781
 80% 35961
 90% 37069
 95% 37879
 98% 38203
 99% 38340
 100% 57299 (longest request)

ab.exe log Test 2

bash-3.1$ ./ab -n 1000 -c 100 "http://192.168.56.101:8080/text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config"
 This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
 Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.56.101 (be patient)
 Completed 100 requests
 Completed 200 requests
 Completed 300 requests
 Completed 400 requests
 Completed 500 requests
 Completed 600 requests
 Completed 700 requests
 Completed 800 requests
 Completed 900 requests
 Completed 1000 requests
 Finished 1000 requests
 Server Software: WEBrick/1.3.1
 Server Hostname: 192.168.56.101
 Server Port: 8080
Document Path: /text_documents.txt/?filter[systemType]=OSV_V7R1&filter[action]=Add%20Customer&filter[templateType]=config
 Document Length: 244884 bytes
Concurrency Level: 100
 Time taken for tests: 343.399 seconds
 Complete requests: 1000
 Failed requests: 0
 Total transferred: 245374000 bytes
 HTML transferred: 244884000 bytes
 Requests per second: 2.91 [#/sec] (mean)
 Time per request: 34339.862 [ms] (mean)
 Time per request: 343.399 [ms] (mean, across all concurrent requests)
 Transfer rate: 697.80 [Kbytes/sec] received
Connection Times (ms)
 min mean[+/-sd] median max
 Connect: 0 2 2.2 2 23
 Processing: 639 32505 5954.2 33841 37156
 Waiting: 594 32356 5953.0 33699 37114
 Total: 643 32508 5954.2 33848 37157
Percentage of the requests served within a certain time (ms)
 50% 33848
 66% 34555
 75% 34970
 80% 35247
 90% 35872
 95% 36190
 98% 36382
 99% 36774
 100% 37157 (longest request)
 bash-3.1$

Summary

  • 2.9 requests/sec and 34.8 sec/requests at 100 concurrent requests and a total of 1000 requests with ~245 kB answer per request (i.e. 687 kB/sec)

Open Issue:

  • ab.exe tells us that the responses are non-2xx, but the server log shows 200 OK.