0

Mesos and Marathon REST API via cURL — A Hello World Example

In this hello-world style blog post, we will learn how to create load balanced Docker services in an open source DC/OS & Mesos environment. For that, we will perform Mesos and Marathon REST API calls using simple cURL commands.

First, we will install the DCOS CLI, before we retrieve the API Token. After playing around with several GET commands for Mesos, IAM, and Marathon, we will create a load-balanced Docker service via the Marathon REST API. In the end, we will show how to check the healthiness of the Marathon service via API calls.

References

Step 1: Install DCOS CLI

On the bootstrap node or a master node of the DCOS cluster, start a centos container as follows:

docker run -it centos bash

On a GUI based machine, open a browser and head to <DCOS_MASTER_URL>. Log in to the DCOS UI:

Choose “Install CLI” and cut&paste the content into the centos container:

sudo echo hallo 2>/dev/null || alias sudo=$@
[ -d /usr/local/bin ] || sudo mkdir -p /usr/local/bin && 
curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.10/dcos -o dcos && 
sudo mv dcos /usr/local/bin && 
sudo chmod +x /usr/local/bin/dcos && 
dcos cluster setup http://94.130.187.229 && 
dcos

The first line will make sure that the sudo prefix is ignored. You should see something as follows:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14.0M  100 14.0M    0     0  11.8M      0  0:00:01  0:00:01 --:--:-- 11.8M
If your browser didn't open, please go to the following link:

    http://94.130.187.229/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob

Enter OpenID Connect ID Token:

Step 2: Retrieve the OpenID Token

Head to the URL given in the output and log in again. DC/OS will present your Connect ID Token:

Click “Copy to Clipboard” and paste it to the open terminal after “Enter OpenID Connect ID Token:”. The output should look like follows:

Command line utility for the Mesosphere Datacenter Operating
System (DC/OS). The Mesosphere DC/OS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DC/OS installation.

Available DC/OS commands:

        auth            Authenticate to DC/OS cluster
        cluster         Manage your DC/OS clusters
        config          Manage the DC/OS configuration file
        experimental    Manage commands that are under development
        help            Display help information about DC/OS
        job             Deploy and manage jobs in DC/OS
        marathon        Deploy and manage applications to DC/OS
        node            View DC/OS node information
        package         Install and manage DC/OS software packages
        service         Manage DC/OS services
        task            Manage DC/OS tasks

Get detailed command description with 'dcos  --help'.

We can test the DCOS CLI by entering dcos node command, which should produce an output similar to the one that follows:

# dcos node
   HOSTNAME           IP                           ID                    TYPE
 195.201.17.1    195.201.17.1   f2966d51-12b2-43f4-8d7a-1e8fb39fe80d-S0  agent
195.201.27.175  195.201.27.175  311a96d6-b5fc-4939-b9ef-92a6d1e0ae1f-S0  agent
master.mesos.   94.130.187.229    311a96d6-b5fc-4939-b9ef-92a6d1e0ae1f   master (leader)

Step 3: Retrieve the API Token

The manual procedure above will set the API token automatically.

Note: the API token is valid for 5 days only and the API requires you to re-fresh it by re-performing the manual steps 1 and 2. An automatic way of API retrieval via IAM API requires the installation of the DC/OS Enterprise edition and is described here.

Verify that the token is set:

# dcos config show core.dcos_acs_token
eyJhb...

Note: the first time, I tried, I got an error message like follows:

Property 'core.dcos_acs_token' doesn't exist

I have resolved the issue by re-authenticating with a ‘dcos auth login’ a second time.

Step 4: Mesos and Marathon API Calls

Preparation: Install jq and less

We will use the jq program (JSON Queries) for prettifying JSON output and less for easier handling the output of the curl commands. Let us install those programs now:

yum install -y epel-release
yum install -y jq less

Step 4.1: A first Mesos REST API Call

Test the API with our first API call:

(container)# curl --header "Authorization: token=$(dcos config show core.dcos_acs_token)" \
    http://94.130.187.229/mesos/master/state.json | jq '.' | less
{
 "version": "1.4.2",
 "git_sha": "732c49e6e98ac720df3418d9d868a6dfe1b2c6b5",
 "build_date": "2017-12-22 12:23:23",
 "build_time": 1513945403,
 "build_user": "",
...

Here, we have piped the prettified output from jq into a less. To avoid a jq error (“Usage: jq …”), we have run jq with the standard “do nothing” filter ‘.’ as found on StackOverflow: How to use `jq` in a shell pipeline.

You will need to add your master’s IP address.

Step 4.2: A first Marathon REST API Call

The following command will show all Marathon services:

(container)# curl -X GET -H 'Content-Type: application/json' \
    -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
    http://94.130.187.229/service/marathon/v2/apps/ | jq '.' | less
{
  "apps": [
    {
      "id": "/marathon-lb",
      "acceptedResourceRoles": [
        "slave_public"
      ],
...

You can see that I already have installed a Marathon Load Balancer. In a fresh DC/OS installation, you might get an empty apps list.

Step 4.3: A first IAM REST API Call

Show all users of the system (the output will show up in an uglified version as a one-liner, though):

(container)# curl -X GET -H 'Content-Type: application/json' \
    -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
    http://94.130.187.229/acs/api/v1/users | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   144  100   144    0     0  15425      0 --:--:-- --:--:-- --:--:-- 16000
{
  "array": [
    {
      "uid": "yourfirstuser@email.com",
      "description": "yourfirstuser@email.com"
    }
  ]
}

As a minimum, you will see a single list entry: the user you have logged into DC/OS first time.

Step 5: Create a new Docker Service via Marathon REST API

Now let us now create a new Marathon Service. For that, we will

  • define a service template named app.json
  • send an HTTP PUT with the app.json in the body to create/update the service
  • review the results in the graphical user interface and
  • access the created service.

Step 5.0: Install the Marathon Load Balancer

As a prerequisite of the load-balanced service, we intend to deploy, we need to install the Marathon load balancer as follows:

dcos package install marathon-lb

Step 5.1: Define a Service Template (app.json)

Now let us now create a new Marathon Service. For that, we define a service template named app.json as follows:

{
  "id": "/mynamespace/nginx-hello-world-service",
  "backoffFactor": 1.15,
  "backoffSeconds": 1,
  "container": {
    "portMappings": [
      {
        "containerPort": 80,
        "hostPort": 0,
        "labels": {
          "VIP_0": "/mynamespace/nginx-hello-world-service:80"
        },
        "protocol": "tcp",
        "servicePort": 80,
        "name": "mynamespace-nginx"
      }
    ],
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "nginxdemos/hello",
      "forcePullImage": false,
      "privileged": false,
      "parameters": []
    }
  },
  "cpus": 0.1,
  "disk": 0,
  "healthChecks": [
    {
      "gracePeriodSeconds": 15,
      "ignoreHttp1xx": false,
      "intervalSeconds": 3,
      "maxConsecutiveFailures": 2,
      "portIndex": 0,
      "timeoutSeconds": 2,
      "delaySeconds": 15,
      "protocol": "HTTP",
      "path": "/"
    }
  ],
  "instances": 3,
  "labels": {
    "HAPROXY_DEPLOYMENT_GROUP": "nginx-hostname",
    "HAPROXY_0_REDIRECT_TO_HTTPS": "false",
    "HAPROXY_GROUP": "external",
    "HAPROXY_DEPLOYMENT_ALT_PORT": "80",
    "HAPROXY_0_PATH": "/mynamespace/nginx",
    "HAPROXY_0_VHOST": "195.201.17.1"
  },
  "maxLaunchDelaySeconds": 3600,
  "mem": 100,
  "gpus": 0,
  "networks": [
    {
      "mode": "container/bridge"
    }
  ],
  "requirePorts": false,
  "upgradeStrategy": {
    "maximumOverCapacity": 1,
    "minimumHealthCapacity": 1
  },
  "killSelection": "YOUNGEST_FIRST",
  "unreachableStrategy": {
    "inactiveAfterSeconds": 0,
    "expungeAfterSeconds": 0
  },
  "fetch": [],
  "constraints": []
}

Note that you need to replace the HAPROXY_0_VHOST (in red) by the IP address that matches your network.

Step 5.2: Create a Service

We now can create our first marathon service (“app”) as follows:

# curl  -X PUT -H 'Content-Type: application/json' \
    -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
    -d '@app.json' \
    http://94.130.187.229/service/marathon/v2/apps/mynamespace/nginx-hello-world-service
{"version":"2018-03-27T17:32:20.243Z","deploymentId":"a1bab3a4-3cbf-40de-a67e-3a1c961d9ad9"}

Note that we have received an immediate response including the deployment ID. With that information, we later can periodically check, whether the deployment is finished by asking the API about the status of the deployment.

Step 5.3 (optional): Review the Service in the GUI

We can see that we have created a new namespace:

Within that namespace, we have created a new service:

And the service is running on three container instances, as we have defined it in the app.json file:

Step 5.4 (optional): Access the Service

The created service can be accessed via the public agent’s IP address and port 80 on the path /mynamespace/nginx:

When you reload the page several times, you will see that the load balancer is using a round-robin balancing strategy to distribute the load among the three containers.

Step 6: Print App Summary

Step 6.1: App Summary

Above, we have seen a list of containers of our service in the GUI. Now let us retrieve the same information via API:

# APP_INFO=$(curl  -X GET -H 'Content-Type: application/json' \
      -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \  http://94.130.187.229/service/marathon/v2/apps/mynamespace/nginx-hello-world-service)

the output is organized as follows:

# echo "$APP_INFO" | jq
{
  "app": {
    "id": "/mynamespace/nginx-hello-world-service",
    ...
    "container": {            <----- information about Docker image, volumes, port mappings etc.
      ...                     
    },
    "cpus": 0.1,              <----- information about resource reservation, health checks, number of instances
    ...
    "labels": {               <----- information about HAPROXY configuration
      "HAPROXY_DEPLOYMENT_GROUP": "nginx-hostname",
      ...
    },
    ...
    "tasksStaged": 0,
    "tasksRunning": 3,
    "tasksHealthy": 3,
    "tasksUnhealthy": 0,
    "deployments": [],
    "tasks": [
      {
        "ipAddresses": [
          {
            "ipAddress": "172.17.0.8",
            "protocol": "IPv4"
          }
        ],
        "stagedAt": "2018-03-27T18:14:53.139Z",
        "state": "TASK_RUNNING",
        "ports": [
          19891
        ],
        "startedAt": "2018-03-27T18:14:54.324Z",
        "version": "2018-03-27T18:14:53.082Z",
        "id": "mynamespace_nginx-hello-world-service.bf0082f9-31ea-11e8-833d-f24b754eb1a3",
        "appId": "/mynamespace/nginx-hello-world-service",
        "slaveId": "311a96d6-b5fc-4939-b9ef-92a6d1e0ae1f-S0",
        "host": "195.201.27.175",
        "healthCheckResults": [
          {
            "alive": true,
            "consecutiveFailures": 0,
            "firstSuccess": "2018-03-27T18:14:56.131Z",
            "lastFailure": null,
            "lastSuccess": "2018-04-01T15:03:29.708Z",
            "lastFailureCause": null,
            "instanceId": "mynamespace_nginx-hello-world-service.marathon-bf0082f9-31ea-11e8-833d-f24b754eb1a3"
          }
        ]
      },
      {
      ... <--------- second container ("task")
      },
      {
      ... <--------- third container ("task")
      }
    ]
  }
}

The output shows us information on the deployed service (“app”) like

  • app id,
  • Docker container information,
  • information on the resource reservations
  • labels that are used to configure the HA proxy load balancer
  • health information and information on the number of containers
  • information on each and every container (“task”).

As an example, we can go to the “tasks” section of the output and extract the information that we can reach the first container in the tasks list on http://195.201.27.175:19891 (“host” 195.201.27.175 and “port(s)” 19891):

Step 6.2: Check Service Health

We can check the service health of a service (“app”) by checking following conditions:

  1. There should be no active deployments:
### Example:
### "deployments": [],
###
$ [ "$(echo "$APP_INFO" | jq '.app.deployments')" == '[]' ] && echo "OK: no active deployments"
OK: no active deployments
  1. There should be no staged tasks:
### Example:
### "tasksStaged": 0,
###
$ [ "$(echo "$APP_INFO" | jq '.app.tasksStaged')" == '0' ] && echo "OK: no staged tasks"
OK: no staged tasks
  1. All running tasks should be healthy:
### Example:
### "tasksRunning": 3,
### "tasksHealthy": 3,
###
$ [ "$(echo "$APP_INFO" | jq '.app.tasksRunning')" == "$(echo "$APP_INFO" | jq '.app.tasksHealthy')" ] && echo "OK: all running tasks are healthy"
OK: all running tasks are healthy
  1. There should be no unhealthy tasks:
### Example:
### "tasksUnhealthy": 0,
###
$ [ "$(echo "$APP_INFO" | jq '.app.tasksUnhealthy')" == '0' ] && echo "OK: no unhealthy tasks"
OK: no unhealthy tasks

Combined, we can check for the app health as follows:

#!/bin/bash

MASTER=94.130.187.229

function checkAppHealth {
   APP_INFO=$1
   ERROR=''
   DEPLOYMENTS="$(echo "$APP_INFO" | jq '.app.deployments')"
   TASKS_STAGED="$(echo "$APP_INFO" | jq '.app.tasksStaged')"
   TASKS_RUNNING="$(echo "$APP_INFO" | jq '.app.tasksRunning')"
   TASKS_HEALTHY="$(echo "$APP_INFO" | jq '.app.tasksHealthy')"
   TASKS_UNHEALTHY="$(echo "$APP_INFO" | jq '.app.tasksUnhealthy')"
   [ "$DEPLOYMENTS" == '[]' ] || ERROR="Found active deployments for this app: $DEPLOYMENTS"
   [ "$ERROR" != "" ] && return 1
   [ "$TASKS_STAGED" == '0' ] || ERROR="Found $TASKS_STAGED staged tasks for this app"
   [ "$ERROR" != "" ] && return 1
   [ "$TASKS_RUNNING" == "$TASKS_HEALTHY" ] || ERROR="Not all running tasks ($TASKS_RUNNING) seem to be healthy ($TASKS_HEALTHY)"
   [ "$ERROR" != "" ] && return 1
   [ "$TASKS_UNHEALTHY" == '0' ] || ERROR='Found $TASKS_UNHEALTHY unhealthy tasks'
   [ "$ERROR" != "" ] && return 1
   return 0
}

APP_INFO="$(curl -X GET -H 'Content-Type: application/json' \
                 -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
                 http://${MASTER}/service/marathon/v2/apps/mynamespace/nginx-hello-world-service)"
ERROR="$(checkAppHealth $APP_INFO)"

if [ "$ERROR" == "" ]; then
   echo "Service status: healthy"
else
   echo "Service status: ERROR: $ERROR"
fi

Summary

In this blog post, we have learned how to

  • install the DCOS CLI
  • retrieve the REST API Token in an open source DC/OS environment
  • use the Marathon REST API to create a load-balanced Docker service that can be accessed from the Internet
  • check the service health of a Marathon service.

 

Appendix: Print List of Containers of a Marathon Service

Above, we have seen a list of containers of our service in the GUI. Now let us retrieve the same information via API.

# curl  -X GET -H 'Content-Type: application/json' \
      -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
      http://94.130.187.229/service/marathon/v2/apps/mynamespace/nginx-hello-world-service 2>/dev/null \
      | jq '.app.tasks'
[
  {
    "ipAddresses": [
      {
        "ipAddress": "172.17.0.8",
        "protocol": "IPv4"
      }
    ],
    "stagedAt": "2018-03-27T18:14:53.139Z",
    "state": "TASK_RUNNING",
    "ports": [
      19891
    ],
    "startedAt": "2018-03-27T18:14:54.324Z",
    "version": "2018-03-27T18:14:53.082Z",
    "id": "mynamespace_nginx-hello-world-service.bf0082f9-31ea-11e8-833d-f24b754eb1a3",
    "appId": "/mynamespace/nginx-hello-world-service",

 

# curl -X GET -H 'Content-Type: application/json' \
     -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
     http://94.130.187.229/service/marathon/v2/apps/mynamespace/nginx-hello-world-service\
     | jq '.app.tasksHealthy'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3531    0  3531    0     0   746k      0 --:--:-- --:--:-- --:--:--  862k
3

Here, we can see that there are 3 healthy “tasks”, i.e. Docker containers.

Appendix: Continuously print List of Marathon Deployments

With following while loop, we can print the list of deployments:

# while true; do 
  curl -X GET -H 'Content-Type: application/json' \
  -H "Authorization: token=$(dcos config show core.dcos_acs_token)" \
  http://94.130.187.229/service/marathon/v2/deployments; echo ""
  sleep 1; 
done

When we now go into the DC/OS GUI and restart the service, we will see something as follows:

[]
[]
[]
[]
[]
[{"id":"0252a58b-9ff5-4ed5-9169-90b3d2cd6ea0","version":"2018-03-27T18:14:53.082Z","affectedApps":["/mynamespace/nginx-hello-world-service"],"affectedPods":[],"steps":[{"actions":[{"action":"RestartApplication","app":"/mynamespace/nginx-hello-world-service"}]}],"currentActions":[{"action":"RestartApplication","app":"/mynamespace/nginx-hello-world-service","readinessCheckResults":[]}],"currentStep":1,"totalSteps":1}]
[{"id":"0252a58b-9ff5-4ed5-9169-90b3d2cd6ea0","version":"2018-03-27T18:14:53.082Z","affectedApps":["/mynamespace/nginx-hello-world-service"],"affectedPods":[],"steps":[{"actions":[{"action":"RestartApplication","app":"/mynamespace/nginx-hello-world-service"}]}],"currentActions":[{"action":"RestartApplication","app":"/mynamespace/nginx-hello-world-service","readinessCheckResults":[]}],"currentStep":1,"totalSteps":1}]
[]
[]

I.e., the list of deployments is empty most of the time, but when you restart the service, a deployment is running for longer than one second.

Appendix: Retrieve IP Address and Port of a specific Docker Container

We can use the following commands to retrieve the IP address and TCP port of a certain container (tested in a DC/OS Enterprise environment with a service account):

Step 1: Create a Configuration File DCOS_API.cfg

export DCOS_API_USER=MyDcosUserName
export DCOS_API_PASSWORD=MyDcosPassword
export DCOS_API_HOST=https://mesos-master.company.com
export PROXYOPTION="-x proxy:8080"

Step 2: Create DCOS_API

#!/bin/bash

source $0.cfg

MESOS_UID="$DCOS_API_USER"
MESOS_PWD="$DCOS_API_PASSWORD"
MESOS_MASTER=$DCOS_API_HOST
MARATHON_LOCATION=/base/url

if [ "$#" == "2" ]; then
  VERB=$1
  [ "$VERB" != "GET" ] && echo "Warning: only GET is supported currently. Using GET"
  RESOURCE=$2
else
  echo "usage: ´$0 GET resource"
  exit 1
fi

MARATHON_APP_ID=${MARATHON_LOCATION}/${APP_ID}

TOKEN=`curl $PROXYOPTION -s -k -D - -H 'Accept: application/json'  -H 'Content-Type: application/json' "$MESOS_MASTER/acs/api/v1/auth/login" --data "{\"password\":\"$MESOS_PWD\",\"uid\":\"$MESOS_UID\" }" | grep token | awk -F ":" '{print $2}' | awk -F "\"" '{print $2}'`


#                        -H 'Accept: application/json' \
curl $PROXYOPTION -s -k -H "Authorization: token=$TOKEN" \
                        -H 'Accept: text/plain' \
                        -H 'Content-Type: application/json' \
                        "$MESOS_MASTER/service/marathon/$RESOURCE"

Read the host + port of a single Container from the Marathon REST API

#!/bin/bash

MESOS_UID="$DCOS_API_USER"
MESOS_PWD="$DCOS_API_PASSWORD"
MESOS_MASTER=$DCOS_API_HOST
MARATHON_LOCATION=/base/url

if [ "$#" == "1" ]; then
  APP_ID=$1
else
  echo "APP_ID_MISSING"
  exit 1
fi

MARATHON_APP_ID=${MARATHON_LOCATION}/${APP_ID}

TOKEN=`curl $PROXYOPTION -s -k -D - -H 'Accept: application/json'  -H 'Content-Type: application/json' "$MESOS_MASTER/acs/api/v1/auth/login" --data "{\"password\":\"$MESOS_PWD\",\"uid\":\"$MESOS_UID\" }" | grep token | awk -F ":" '{print $2}' | awk -F "\"" '{print $2}'`

APPS=`curl $PROXYOPTION -s -H "Authorization: token=$TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' -s -k "$MESOS_MASTER/service/marathon/v2/apps/$MARATHON_APP_ID"`

HOST=`echo "$APPS" | awk -F "\"tasks\"" '{print $2}' | awk -F "\"host\":\"" '{print $2}' | awk -F "\"" '{print $1}'`
PORT=`echo "$APPS" | awk -F "\"tasks\"" '{print $2}' | awk -F "\"ports\":" '{print $2}' | awk -F "\"" '{print $1}' | sed -e 's/\[\(.*\)\],/\1/g'`

[ "$HOST" == "" ] && HOST=NOT_FOUND
[ "$PORT" == "" ] && PORT=NOT_FOUND

echo "$HOST:$PORT"

if [ "$DEBUG" != "" ]; then
   echo "MESOS_MASTER=$MESOS_MASTER" >&2
   echo "DCOS_API_HOST=$DCOS_API_HOST" >&2
   echo "DCOS_API_USER=$DCOS_API_USER" >&2
   echo "TOKEN=$TOKEN" >&2
   echo "APPS=$APPS" >&2
fi

 

2

Getting Started with DC/OS on Vagrant

In the course of this Hello World style tutorial, we will explore DC/OS, a Data Center Operating System developed and open sourced by Mesosphere with the target to hide the complexity of data centers. We will

  • install DC/OS on your local PC or Notebook using Vagrant and VirtualBox,
  • deploy a “hello world” application with more than one instance,
  • load balance between the application instances
  • and make sure the service is reachable from the outside world.

See also part 2: A Step towards productive Docker: installing and testing DC/OS on AWS (starts from scratch and does not require to have read/tested the current post).

DC/OS is a Data Center Operating System is built upon Apache Mesos and Mesosphere Marathon, an open source container orchestration platform. It has the target to hide the complexity of data centers when deploying applications: DC/OS performs the job of deploying your application on your data center hardware: DC/OS will automatically and choose the hardware servers to run your application on. It helps scaling your application according to your needs by adding or removing application instances at a push of a button. DC/OS will make sure that your client’s requests are load balanced and routed to you application instances: there is no need to manually re-configure the load-balancer(s), if you add or destroy an instance of your application: DC/OS will take care of this for you.

Note: If you want to get started with Marathon and Mesos first, you might be interested in this blog post, especially, if the resource requirements of this blog post exceeds what you have at hand: for the DC/OS tutorial you will need 10 GB or RAM, while in the Marathon/Mesos tutorial, 4 GB are sufficient.

Table of Contents

Target

What I want to do in this session:

  • Install DC/OS on the local machine using Vagrant+VirtualBox
  • Explore the networking and load balancing capabilities of DC/OS

Tools and Versions used

  • Vagrant 1.8.6
  • Virtualbox 5.0.20 r106931
  • for Windows: GNU bash, version 4.3.42(5)-release (x86_64-pc-msys)
  • DCOS 1.8.8

Prerequisites

  • 10 GB free DRAM
  • tested with 4 virtual CPUs (Quad Core CPU)
  • Git is installed

Step 1: Install Vagrant and VirtualBox

Step 1.1: Install VirtualBox

Download and install VirtualBox. I am running version 5.0.20 r106931.

If the installation fails with error message “Setup Wizard ended prematurely” see Appendix A: Virtualbox Installation Workaround below

Step 1.2: Install Vagrant

Download and install Vagrant (requires a reboot).

Step 2: Download Vagrant Box

We are following the Readme on https://github.com/dcos/dcos-vagrant:

Since this might be a long-running task (especially, if you are sitting in a hotel with low speed Internet connection like I do in the moment), we best start by downloading DC/OS first:

(base system)$ vagrant box add https://downloads.dcos.io/dcos-vagrant/metadata.json
==> box: Loading metadata for box 'https://downloads.dcos.io/dcos-vagrant/metadata.json'
==> box: Adding box 'mesosphere/dcos-centos-virtualbox' (v0.8.0) for provider: virtualbox
 box: Downloading: https://downloads.dcos.io/dcos-vagrant/dcos-centos-virtualbox-0.8.0.box
 box: Progress: 100% (Rate: 132k/s, Estimated time remaining: --:--:--)
 box: Calculating and comparing box checksum...
==> box: Successfully added box 'mesosphere/dcos-centos-virtualbox' (v0.8.0) for 'virtualbox'!

Step 3: Clone DCOS-Vagrant Repo

On another window, we clone the dcos-vagrant git repo:

(base system)$ git clone https://github.com/dcos/dcos-vagrant
Cloning into 'dcos-vagrant'...
remote: Counting objects: 2171, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 2171 (delta 0), reused 0 (delta 0), pack-reused 2167
Receiving objects: 100% (2171/2171), 14.98 MiB | 123.00 KiB/s, done.
Resolving deltas: 100% (1297/1297), done.
Checking connectivity... done.
(base system)$ cd dcos-vagrant

VagrantConfig.yaml shows:

m1:
 ip: 192.168.65.90
 cpus: 2
 memory: 1024
 type: master
a1:
 ip: 192.168.65.111
 cpus: 4
 memory: 6144
 memory-reserved: 512
 type: agent-private
p1:
 ip: 192.168.65.60
 cpus: 2
 memory: 1536
 memory-reserved: 512
 type: agent-public
 aliases:
 - spring.acme.org
 - oinker.acme.org
boot:
 ip: 192.168.65.50
 cpus: 2
 memory: 1024
 type: boot

m1 is the DC/OS master. Private containers will run on a1, while the load balancer containers are public and will run on p1.

Step 4: Install Vagrant Hostmanager Plugin

Installation of the Vagrant Hostmanager Plugin is required; I had tried without, because I did not think that it works on Windows. However, vagrant up will not succeed, if the plugin is not installed; the presence of the plugin is checked before booting up the Vagrant box.

(base system)$ vagrant plugin install vagrant-hostmanager
Installing the 'vagrant-hostmanager' plugin. This can take a few minutes...
Installed the plugin 'vagrant-hostmanager (1.8.5)'!

Note: Some version updates later (VirtualBox 5.1.28 r117968 (Qt5.6.2)), I have found out, that also the VirtualBox Guest additions are needed in order to avoid the error message sbin/mount.vboxsf: mounting failed with the error: No such device.
For that, I needed to re-apply the command
vagrant plugin install vagrant-vbguest.

However, it still did not work. I could vagrant ssh to the box and I found in /var/log/vboxadd-install.log that it did not find the kernel headers during installation of the vbox guest additions. yum install kernel-headers returned that kernel-headers-3.10.0-693.5.2.el7.x86_64 were already installed. However, ls /usr/src/kernels/ showed, that there is a directory named 3.10.0-327.36.1.el7.x86_64 instead of 3.10.0-327.36.1.el7.x86_64. Now I have done a sudo ln -s 3.10.0-327.36.1.el7.x86_64 3.10.0-327.el7.x86_64 within the directory /usr/src/kernels/, and I could do a vagrant up with no problems. I guess un-installing and re-installing the headers would work as well.

All this did not work, but I have found that the build link on was wrong (hint was found here):

I fixed the link with cd /lib/modules/3.10.0-327.el7.x86_64; sudo mv build build.broken; sudo ln -s /usr/src/kernels/3.10.0-327.36.1.el7.x86_64 build
then cd /opt/VBoxGuestAdditions-*/init; sudo ./vboxadd setup

But still did not work! I give up and try installing DC/OS on AWS. Keep tuned.

Step 5: Boot DC/OS

Below I have set the DCOS_VERSION in order to get the exact same results next time I perform the test. If you omit to set the environment variable, the latest stable version will be used, when you boot up the VirtualBox VM:

(base system)$ export DCOS_VERSION=1.8.8
(base system)$ vagrant up Vagrant Patch Loaded: GuestLinux network_interfaces (1.8.6) Validating Plugins... Validating User Config... Downloading DC/OS 1.8.8 Installer... Source: https://downloads.dcos.io/dcos/stable/commit/602edc1b4da9364297d166d4857fc8ed7b0b65ca/dcos_generate_config.sh Destination: installers/dcos/dcos_generate_config-1.8.8.sh Progress: 16% (Rate: 1242k/s, Estimated time remaining: 0:09:16)

The speed of the hotel Internet seems to be better now, this late in the night…

(base system)$ vagrant up
Vagrant Patch Loaded: GuestLinux network_interfaces (1.8.6)
Validating Plugins...
Validating User Config...
Downloading DC/OS 1.8.8 Installer...
Source: https://downloads.dcos.io/dcos/stable/commit/602edc1b4da9364297d166d4857fc8ed7b0b65ca/dcos_generate_config.sh
Destination: installers/dcos/dcos_generate_config-1.8.8.sh
Progress: 100% (Rate: 1612k/s, Estimated time remaining: --:--:--)
Validating Installer Checksum...
Using DC/OS Installer: installers/dcos/dcos_generate_config-1.8.8.sh
Using DC/OS Config: etc/config-1.8.yaml
Validating Machine Config...
Configuring VirtualBox Host-Only Network...
Bringing machine 'm1' up with 'virtualbox' provider...
Bringing machine 'a1' up with 'virtualbox' provider...
Bringing machine 'p1' up with 'virtualbox' provider...
Bringing machine 'boot' up with 'virtualbox' provider...
==> m1: Importing base box 'mesosphere/dcos-centos-virtualbox'...
==> m1: Matching MAC address for NAT networking...
==> m1: Checking if box 'mesosphere/dcos-centos-virtualbox' is up to date...
==> m1: Setting the name of the VM: m1.dcos
==> m1: Fixed port collision for 22 => 2222. Now on port 2201.
==> m1: Clearing any previously set network interfaces...
==> m1: Preparing network interfaces based on configuration...
    m1: Adapter 1: nat
    m1: Adapter 2: hostonly
==> m1: Forwarding ports...
    m1: 22 (guest) => 2201 (host) (adapter 1)
==> m1: Running 'pre-boot' VM customizations...
==> m1: Booting VM...
==> m1: Waiting for machine to boot. This may take a few minutes...
    m1: SSH address: 127.0.0.1:2201
    m1: SSH username: vagrant
    m1: SSH auth method: private key
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
    m1: Warning: Remote connection disconnect. Retrying...
==> m1: Machine booted and ready!
==> m1: Checking for guest additions in VM...
==> m1: Setting hostname...
==> m1: Configuring and enabling network interfaces...
==> m1: Mounting shared folders...
    m1: /vagrant => D:/veits/Vagrant/ubuntu-trusty64-docker_2017-02/dcos-vagrant
==> m1: Updating /etc/hosts file on active guest machines...
==> m1: Updating /etc/hosts file on host machine (password may be required)...
==> m1: Running provisioner: shell...
    m1: Running: inline script
==> m1: Running provisioner: dcos_ssh...
    host: Generating new keys...
==> m1: Inserting generated public key within guest...
==> m1: Configuring vagrant to connect using generated private key...
==> m1: Removing insecure key from the guest, if it's present...
==> m1: Running provisioner: shell...
    m1: Running: script: Certificate Authorities
==> m1: >>> Installing Certificate Authorities
==> m1: Running provisioner: shell...
    m1: Running: script: Install Probe
==> m1: Probe already installed: /usr/local/sbin/probe
==> m1: Running provisioner: shell...
    m1: Running: script: Install jq
==> m1: jq already installed: /usr/local/sbin/jq
==> m1: Running provisioner: shell...
    m1: Running: script: Install DC/OS Postflight
==> m1: >>> Installing DC/OS Postflight: /usr/local/sbin/dcos-postflight
==> a1: Importing base box 'mesosphere/dcos-centos-virtualbox'...
==> a1: Matching MAC address for NAT networking...
==> a1: Checking if box 'mesosphere/dcos-centos-virtualbox' is up to date...
==> a1: Setting the name of the VM: a1.dcos
==> a1: Fixed port collision for 22 => 2222. Now on port 2202.
==> a1: Clearing any previously set network interfaces...
==> a1: Preparing network interfaces based on configuration...
    a1: Adapter 1: nat
    a1: Adapter 2: hostonly
==> a1: Forwarding ports...
    a1: 22 (guest) => 2202 (host) (adapter 1)
==> a1: Running 'pre-boot' VM customizations...
==> a1: Booting VM...
==> a1: Waiting for machine to boot. This may take a few minutes...
    a1: SSH address: 127.0.0.1:2202
    a1: SSH username: vagrant
    a1: SSH auth method: private key
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
    a1: Warning: Remote connection disconnect. Retrying...
==> a1: Machine booted and ready!
==> a1: Checking for guest additions in VM...
==> a1: Setting hostname...
==> a1: Configuring and enabling network interfaces...
==> a1: Mounting shared folders...
    a1: /vagrant => D:/veits/Vagrant/ubuntu-trusty64-docker_2017-02/dcos-vagrant
==> a1: Updating /etc/hosts file on active guest machines...
==> a1: Updating /etc/hosts file on host machine (password may be required)...
==> a1: Running provisioner: shell...
    a1: Running: inline script
==> a1: Running provisioner: dcos_ssh...
    host: Found existing keys
==> a1: Inserting generated public key within guest...
==> a1: Configuring vagrant to connect using generated private key...
==> a1: Removing insecure key from the guest, if it's present...
==> a1: Running provisioner: shell...
    a1: Running: script: Certificate Authorities
==> a1: >>> Installing Certificate Authorities
==> a1: Running provisioner: shell...
    a1: Running: script: Install Probe
==> a1: Probe already installed: /usr/local/sbin/probe
==> a1: Running provisioner: shell...
    a1: Running: script: Install jq
==> a1: jq already installed: /usr/local/sbin/jq
==> a1: Running provisioner: shell...
    a1: Running: script: Install DC/OS Postflight
==> a1: >>> Installing DC/OS Postflight: /usr/local/sbin/dcos-postflight
==> a1: Running provisioner: shell...
    a1: Running: script: Install Mesos Memory Modifier
==> a1: >>> Installing Mesos Memory Modifier: /usr/local/sbin/mesos-memory
==> a1: Running provisioner: shell...
    a1: Running: script: DC/OS Agent-private
==> a1: Skipping DC/OS private agent install (boot machine will provision in parallel)
==> p1: Importing base box 'mesosphere/dcos-centos-virtualbox'...
==> p1: Matching MAC address for NAT networking...
==> p1: Checking if box 'mesosphere/dcos-centos-virtualbox' is up to date...
==> p1: Setting the name of the VM: p1.dcos
==> p1: Fixed port collision for 22 => 2222. Now on port 2203.
==> p1: Clearing any previously set network interfaces...
==> p1: Preparing network interfaces based on configuration...
    p1: Adapter 1: nat
    p1: Adapter 2: hostonly
==> p1: Forwarding ports...
    p1: 22 (guest) => 2203 (host) (adapter 1)
==> p1: Running 'pre-boot' VM customizations...
==> p1: Booting VM...
==> p1: Waiting for machine to boot. This may take a few minutes...
    p1: SSH address: 127.0.0.1:2203
    p1: SSH username: vagrant
    p1: SSH auth method: private key
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
    p1: Warning: Remote connection disconnect. Retrying...
==> p1: Machine booted and ready!
==> p1: Checking for guest additions in VM...
==> p1: Setting hostname...
==> p1: Configuring and enabling network interfaces...
==> p1: Mounting shared folders...
    p1: /vagrant => D:/veits/Vagrant/ubuntu-trusty64-docker_2017-02/dcos-vagrant
==> p1: Updating /etc/hosts file on active guest machines...
==> p1: Updating /etc/hosts file on host machine (password may be required)...
==> p1: Running provisioner: shell...
    p1: Running: inline script
==> p1: Running provisioner: dcos_ssh...
    host: Found existing keys
==> p1: Inserting generated public key within guest...
==> p1: Configuring vagrant to connect using generated private key...
==> p1: Removing insecure key from the guest, if it's present...
==> p1: Running provisioner: shell...
    p1: Running: script: Certificate Authorities
==> p1: >>> Installing Certificate Authorities
==> p1: Running provisioner: shell...
    p1: Running: script: Install Probe
==> p1: Probe already installed: /usr/local/sbin/probe
==> p1: Running provisioner: shell...
    p1: Running: script: Install jq
==> p1: jq already installed: /usr/local/sbin/jq
==> p1: Running provisioner: shell...
    p1: Running: script: Install DC/OS Postflight
==> p1: >>> Installing DC/OS Postflight: /usr/local/sbin/dcos-postflight
==> p1: Running provisioner: shell...
    p1: Running: script: Install Mesos Memory Modifier
==> p1: >>> Installing Mesos Memory Modifier: /usr/local/sbin/mesos-memory
==> p1: Running provisioner: shell...
    p1: Running: script: DC/OS Agent-public
==> p1: Skipping DC/OS public agent install (boot machine will provision in parallel)
==> boot: Importing base box 'mesosphere/dcos-centos-virtualbox'...
==> boot: Matching MAC address for NAT networking...
==> boot: Checking if box 'mesosphere/dcos-centos-virtualbox' is up to date...
==> boot: Setting the name of the VM: boot.dcos
==> boot: Fixed port collision for 22 => 2222. Now on port 2204.
==> boot: Clearing any previously set network interfaces...
==> boot: Preparing network interfaces based on configuration...
    boot: Adapter 1: nat
    boot: Adapter 2: hostonly
==> boot: Forwarding ports...
    boot: 22 (guest) => 2204 (host) (adapter 1)
==> boot: Running 'pre-boot' VM customizations...
==> boot: Booting VM...
==> boot: Waiting for machine to boot. This may take a few minutes...
    boot: SSH address: 127.0.0.1:2204
    boot: SSH username: vagrant
    boot: SSH auth method: private key
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
    boot: Warning: Remote connection disconnect. Retrying...
==> boot: Machine booted and ready!
==> boot: Checking for guest additions in VM...
==> boot: Setting hostname...
==> boot: Configuring and enabling network interfaces...
==> boot: Mounting shared folders...
    boot: /vagrant => D:/veits/Vagrant/ubuntu-trusty64-docker_2017-02/dcos-vagrant
==> boot: Updating /etc/hosts file on active guest machines...
==> boot: Updating /etc/hosts file on host machine (password may be required)...
==> boot: Running provisioner: shell...
    boot: Running: inline script
==> boot: Running provisioner: dcos_ssh...
    host: Found existing keys
==> boot: Inserting generated public key within guest...
==> boot: Configuring vagrant to connect using generated private key...
==> boot: Removing insecure key from the guest, if it's present...
==> boot: Running provisioner: shell...
    boot: Running: script: Certificate Authorities
==> boot: >>> Installing Certificate Authorities
==> boot: Running provisioner: shell...
    boot: Running: script: Install Probe
==> boot: Probe already installed: /usr/local/sbin/probe
==> boot: Running provisioner: shell...
    boot: Running: script: Install jq
==> boot: jq already installed: /usr/local/sbin/jq
==> boot: Running provisioner: shell...
    boot: Running: script: Install DC/OS Postflight
==> boot: >>> Installing DC/OS Postflight: /usr/local/sbin/dcos-postflight
==> boot: Running provisioner: shell...
    boot: Running: script: DC/OS Boot
==> boot: Error: No such image or container: zookeeper-boot
==> boot: >>> Starting zookeeper (for exhibitor bootstrap and quorum)
==> boot: a58a678182b4c60df5fd4e1a0b86407456a33c75f4289c7fd7b0ce761afed567
==> boot: Error: No such image or container: nginx-boot
==> boot: >>> Starting nginx (for distributing bootstrap artifacts to cluster)
==> boot: c4bceea034f4d7488ae5ddd6ed708640a56064b191cd3d640a3311a58c5dcb5b
==> boot: >>> Downloading dcos_generate_config.sh (for building bootstrap image for system)
==> boot:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
==> boot:                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 22  723M   22  160M    0     0   171M      0  0:00:04 --:--:--  0:00:04  171M
 41  723M   41  300M    0     0   155M      0  0:00:04  0:00:01  0:00:03  139M
 65  723M   65  471M    0     0   160M      0  0:00:04  0:00:02  0:00:02  155M
 88  723M   88  642M    0     0   163M      0  0:00:04  0:00:03  0:00:01  160M
100  723M  100  723M    0     0   164M      0  0:00:04  0:00:04 --:--:--  163M
==> boot: Running provisioner: dcos_install...
==> boot: Reading etc/config-1.8.yaml
==> boot: Analyzing machines
==> boot: Generating Configuration: ~/dcos/genconf/config.yaml
==> boot: sudo: cat << EOF > ~/dcos/genconf/config.yaml
==> boot:       ---
==> boot:       master_list:
==> boot:       - 192.168.65.90
==> boot:       agent_list:
==> boot:       - 192.168.65.111
==> boot:       - 192.168.65.60
==> boot:       cluster_name: dcos-vagrant
==> boot:       bootstrap_url: http://192.168.65.50
==> boot:       exhibitor_storage_backend: static
==> boot:       master_discovery: static
==> boot:       resolvers:
==> boot:       - 10.0.2.3
==> boot:       superuser_username: admin
==> boot:       superuser_password_hash: "\$6\$rounds=656000\$123o/Qz.InhbkbsO\$kn5IkpWm5CplEorQo7jG/27LkyDgWrml36lLxDtckZkCxu22uihAJ4DOJVVnNbsz/Y5MCK3B1InquE6E7Jmh30"
==> boot:       ssh_port: 22
==> boot:       ssh_user: vagrant
==> boot:       check_time: false
==> boot:       exhibitor_zk_hosts: 192.168.65.50:2181
==> boot:
==> boot:       EOF
==> boot:
==> boot: Generating IP Detection Script: ~/dcos/genconf/ip-detect
==> boot: sudo: cat << 'EOF' > ~/dcos/genconf/ip-detect
==> boot:       #!/usr/bin/env bash
==> boot:       set -o errexit
==> boot:       set -o nounset
==> boot:       set -o pipefail
==> boot:       echo $(/usr/sbin/ip route show to match 192.168.65.90 | grep -Eo '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' | tail -1)
==> boot:
==> boot:       EOF
==> boot:
==> boot: Importing Private SSH Key: ~/dcos/genconf/ssh_key
==> boot: sudo: cp /vagrant/.vagrant/dcos/private_key_vagrant ~/dcos/genconf/ssh_key
==> boot:
==> boot: Generating DC/OS Installer Files: ~/dcos/genconf/serve/
==> boot: sudo: cd ~/dcos && bash ~/dcos/dcos_generate_config.sh --genconf && cp -rpv ~/dcos/genconf/serve/* /var/tmp/dcos/ && echo ok > /var/tmp/dcos/ready
==> boot:
==> boot:       Extracting image from this script and loading into docker daemon, this step can take a few minutes
==> boot:       dcos-genconf.602edc1b4da9364297-5df43052907c021eeb.tar
==> boot:       ====> EXECUTING CONFIGURATION GENERATION
==> boot:       Generating configuration files...
==> boot:       Final arguments:{
==> boot:         "adminrouter_auth_enabled":"true",
==> boot:         "bootstrap_id":"5df43052907c021eeb5de145419a3da1898c58a5",
==> boot:         "bootstrap_tmp_dir":"tmp",
==> boot:         "bootstrap_url":"http://192.168.65.50",
==> boot:         "check_time":"false",
==> boot:         "cluster_docker_credentials":"{}",
==> boot:         "cluster_docker_credentials_dcos_owned":"false",
==> boot:         "cluster_docker_credentials_enabled":"false",
==> boot:         "cluster_docker_credentials_write_to_etc":"false",
==> boot:         "cluster_docker_registry_enabled":"false",
==> boot:         "cluster_docker_registry_url":"",
==> boot:         "cluster_name":"dcos-vagrant",
==> boot:         "cluster_packages":"[\"dcos-config--setup_4869fa95533aed5aad36093272289e6bd389b458\", \"dcos-metadata--setup_4869fa95533aed5aad36093272289e6bd389b458\"]",
==> boot:         "config_id":"4869fa95533aed5aad36093272289e6bd389b458",
==> boot:         "config_yaml":"      \"agent_list\": |-\n        [\"192.168.65.111\", \"192.168.65.60\"]\n      \"bootstrap_url\": |-\n        http://192.168.65.50\n      \"check_time\": |-\n        false\n      \"cluster_name\": |-\n        dcos-vagrant\n      \"exhibitor_storage_backend\": |-\n        static\n      \"exhibitor_zk_hosts\": |-\n        192.168.65.50:2181\n      \"master_discovery\": |-\n        static\n      \"master_list\": |-\n        [\"192.168.65.90\"]\n      \"provider\": |-\n        onprem\n      \"resolvers\": |-\n        [\"10.0.2.3\"]\n      \"ssh_port\": |-\n        22\n      \"ssh_user\": |-\n        vagrant\n      \"superuser_password_hash\": |-\n        $6$rounds=656000$123o/Qz.InhbkbsO$kn5IkpWm5CplEorQo7jG/27LkyDgWrml36lLxDtckZkCxu22uihAJ4DOJVVnNbsz/Y5MCK3B1InquE6E7Jmh30\n      \"superuser_username\": |-\n        admin\n",
==> boot:         "curly_pound":"{#",
==> boot:         "custom_auth":"false",
==> boot:         "dcos_gen_resolvconf_search_str":"",
==> boot:         "dcos_image_commit":"602edc1b4da9364297d166d4857fc8ed7b0b65ca",
==> boot:         "dcos_overlay_config_attempts":"4",
==> boot:         "dcos_overlay_enable":"true",
==> boot:         "dcos_overlay_mtu":"1420",
==> boot:         "dcos_overlay_network":"{\"vtep_subnet\": \"44.128.0.0/20\", \"overlays\": [{\"prefix\": 24, \"name\": \"dcos\", \"subnet\": \"9.0.0.0/8\"}], \"vtep_mac_oui\": \"70:B3:D5:00:00:00\"}",
==> boot:         "dcos_remove_dockercfg_enable":"false",
==> boot:         "dcos_version":"1.8.8",
==> boot:         "dns_search":"",
==> boot:         "docker_remove_delay":"1hrs",
==> boot:         "docker_stop_timeout":"20secs",
==> boot:         "exhibitor_static_ensemble":"1:192.168.65.90",
==> boot:         "exhibitor_storage_backend":"static",
==> boot:         "expanded_config":"\"DO NOT USE THIS AS AN ARGUMENT TO OTHER ARGUMENTS. IT IS TEMPORARY\"",
==> boot:         "gc_delay":"2days",
==> boot:         "ip_detect_contents":"'#!/usr/bin/env bash\n\n  set -o errexit\n\n  set -o nounset\n\n  set -o pipefail\n\n  echo $(/usr/sbin/ip route show to match 192.168.65.90 | grep -Eo ''[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}''\n  | tail -1)\n\n\n  '\n",
==> boot:         "ip_detect_filename":"genconf/ip-detect",
==> boot:         "ip_detect_public_contents":"'#!/usr/bin/env bash\n\n  set -o errexit\n\n  set -o nounset\n\n  set -o pipefail\n\n  echo $(/usr/sbin/ip route show to match 192.168.65.90 | grep -Eo ''[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}''\n  | tail -1)\n\n\n  '\n",
==> boot:         "master_discovery":"static",
==> boot:         "master_dns_bindall":"true",
==> boot:         "master_list":"[\"192.168.65.90\"]",
==> boot:         "master_quorum":"1",
==> boot:         "mesos_container_logger":"org_apache_mesos_LogrotateContainerLogger",
==> boot:         "mesos_dns_ip_sources":"[\"host\", \"netinfo\"]",
==> boot:         "mesos_dns_resolvers_str":"\"resolvers\": [\"10.0.2.3\"]",
==> boot:         "mesos_hooks":"",
==> boot:         "mesos_isolation":"cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,docker/volume",
==> boot:         "mesos_log_directory_max_files":"162",
==> boot:         "mesos_log_retention_count":"137",
==> boot:         "mesos_log_retention_mb":"4000",
==> boot:         "minuteman_forward_metrics":"false",
==> boot:         "minuteman_max_named_ip":"11.255.255.255",
==> boot:         "minuteman_max_named_ip_erltuple":"{11,255,255,255}",
==> boot:         "minuteman_min_named_ip":"11.0.0.0",
==> boot:         "minuteman_min_named_ip_erltuple":"{11,0,0,0}",
==> boot:         "num_masters":"1",
==> boot:         "oauth_auth_host":"https://dcos.auth0.com",
==> boot:         "oauth_auth_redirector":"https://auth.dcos.io",
==> boot:         "oauth_available":"true",
==> boot:         "oauth_client_id":"3yF5TOSzdlI45Q1xspxzeoGBe9fNxm9m",
==> boot:         "oauth_enabled":"true",
==> boot:         "oauth_issuer_url":"https://dcos.auth0.com/",
==> boot:         "package_names":"[\n  \"dcos-config\",\n  \"dcos-metadata\"\n]",
==> boot:         "provider":"onprem",
==> boot:         "resolvers":"[\"10.0.2.3\"]",
==> boot:         "resolvers_str":"10.0.2.3",
==> boot:         "rexray_config":"{\"rexray\": {\"modules\": {\"default-docker\": {\"disabled\": true}, \"default-admin\": {\"host\": \"tcp://127.0.0.1:61003\"}}, \"loglevel\": \"info\"}}",
==> boot:         "rexray_config_contents":"\"rexray:\\n  loglevel: info\\n  modules:\\n    default-admin:\\n      host: tcp://127.0.0.1:61003\\n\\\n  \\    default-docker:\\n      disabled: true\\n\"\n",
==> boot:         "rexray_config_preset":"",
==> boot:         "telemetry_enabled":"true",
==> boot:         "template_filenames":"[\n  \"dcos-config.yaml\",\n  \"cloud-config.yaml\",\n  \"dcos-metadata.yaml\",\n  \"dcos-services.yaml\"\n]",
==> boot:         "ui_banner":"false",
==> boot:         "ui_banner_background_color":"#1E232F",
==> boot:         "ui_banner_dismissible":"null",
==> boot:         "ui_banner_footer_content":"null",
==> boot:         "ui_banner_foreground_color":"#FFFFFF",
==> boot:         "ui_banner_header_content":"null",
==> boot:         "ui_banner_header_title":"null",
==> boot:         "ui_banner_image_path":"null",
==> boot:         "ui_branding":"false",
==> boot:         "ui_external_links":"false",
==> boot:         "use_mesos_hooks":"false",
==> boot:         "use_proxy":"false",
==> boot:         "user_arguments":"{\n  \"agent_list\":\"[\\\"192.168.65.111\\\", \\\"192.168.65.60\\\"]\",\n  \"bootstrap_url\":\"http://192.168.65.50\",\n  \"check_time\":\"false\",\n  \"cluster_name\":\"dcos-vagrant\",\n  \"exhibitor_storage_backend\":\"static\",\n  \"exhibitor_zk_hosts\":\"192.168.65.50:2181\",\n  \"master_discovery\":\"static\",\n  \"master_list\":\"[\\\"192.168.65.90\\\"]\",\n  \"provider\":\"onprem\",\n  \"resolvers\":\"[\\\"10.0.2.3\\\"]\",\n  \"ssh_port\":\"22\",\n  \"ssh_user\":\"vagrant\",\n  \"superuser_password_hash\":\"$6$rounds=656000$123o/Qz.InhbkbsO$kn5IkpWm5CplEorQo7jG/27LkyDgWrml36lLxDtckZkCxu22uihAJ4DOJVVnNbsz/Y5MCK3B1InquE6E7Jmh30\",\n  \"superuser_username\":\"admin\"\n}",
==> boot:         "weights":""
==> boot:       }
==> boot:       Generating configuration files...
==> boot:       Final arguments:{
==> boot:         "adminrouter_auth_enabled":"true",
==> boot:         "bootstrap_id":"5df43052907c021eeb5de145419a3da1898c58a5",
==> boot:         "bootstrap_tmp_dir":"tmp",
==> boot:         "bootstrap_url":"http://192.168.65.50",
==> boot:         "check_time":"false",
==> boot:         "cluster_docker_credentials":"{}",
==> boot:         "cluster_docker_credentials_dcos_owned":"false",
==> boot:         "cluster_docker_credentials_enabled":"false",
==> boot:         "cluster_docker_credentials_write_to_etc":"false",
==> boot:         "cluster_docker_registry_enabled":"false",
==> boot:         "cluster_docker_registry_url":"",
==> boot:         "cluster_name":"dcos-vagrant",
==> boot:         "cluster_packages":"[\"dcos-config--setup_4869fa95533aed5aad36093272289e6bd389b458\", \"dcos-metadata--setup_4869fa95533aed5aad36093272289e6bd389b458\"]",
==> boot:         "config_id":"4869fa95533aed5aad36093272289e6bd389b458",
==> boot:         "config_yaml":"      \"agent_list\": |-\n        [\"192.168.65.111\", \"192.168.65.60\"]\n      \"bootstrap_url\": |-\n        http://192.168.65.50\n      \"check_time\": |-\n        false\n      \"cluster_name\": |-\n        dcos-vagrant\n      \"exhibitor_storage_backend\": |-\n        static\n      \"exhibitor_zk_hosts\": |-\n        192.168.65.50:2181\n      \"master_discovery\": |-\n        static\n      \"master_list\": |-\n        [\"192.168.65.90\"]\n      \"provider\": |-\n        onprem\n      \"resolvers\": |-\n        [\"10.0.2.3\"]\n      \"ssh_port\": |-\n        22\n      \"ssh_user\": |-\n        vagrant\n      \"superuser_password_hash\": |-\n        $6$rounds=656000$123o/Qz.InhbkbsO$kn5IkpWm5CplEorQo7jG/27LkyDgWrml36lLxDtckZkCxu22uihAJ4DOJVVnNbsz/Y5MCK3B1InquE6E7Jmh30\n      \"superuser_username\": |-\n        admin\n",
==> boot:         "curly_pound":"{#",
==> boot:         "custom_auth":"false",
==> boot:         "dcos_gen_resolvconf_search_str":"",
==> boot:         "dcos_image_commit":"602edc1b4da9364297d166d4857fc8ed7b0b65ca",
==> boot:         "dcos_overlay_config_attempts":"4",
==> boot:         "dcos_overlay_enable":"true",
==> boot:         "dcos_overlay_mtu":"1420",
==> boot:         "dcos_overlay_network":"{\"vtep_subnet\": \"44.128.0.0/20\", \"overlays\": [{\"prefix\": 24, \"name\": \"dcos\", \"subnet\": \"9.0.0.0/8\"}], \"vtep_mac_oui\": \"70:B3:D5:00:00:00\"}",
==> boot:         "dcos_remove_dockercfg_enable":"false",
==> boot:         "dcos_version":"1.8.8",
==> boot:         "dns_search":"",
==> boot:         "docker_remove_delay":"1hrs",
==> boot:         "docker_stop_timeout":"20secs",
==> boot:         "exhibitor_static_ensemble":"1:192.168.65.90",
==> boot:         "exhibitor_storage_backend":"static",
==> boot:         "expanded_config":"\"DO NOT USE THIS AS AN ARGUMENT TO OTHER ARGUMENTS. IT IS TEMPORARY\"",
==> boot:         "gc_delay":"2days",
==> boot:         "ip_detect_contents":"'#!/usr/bin/env bash\n\n  set -o errexit\n\n  set -o nounset\n\n  set -o pipefail\n\n  echo $(/usr/sbin/ip route show to match 192.168.65.90 | grep -Eo ''[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}''\n  | tail -1)\n\n\n  '\n",
==> boot:         "ip_detect_filename":"genconf/ip-detect",
==> boot:         "ip_detect_public_contents":"'#!/usr/bin/env bash\n\n  set -o errexit\n\n  set -o nounset\n\n  set -o pipefail\n\n  echo $(/usr/sbin/ip route show to match 192.168.65.90 | grep -Eo ''[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}''\n  | tail -1)\n\n\n  '\n",
==> boot:         "master_discovery":"static",
==> boot:         "master_dns_bindall":"true",
==> boot:         "master_list":"[\"192.168.65.90\"]",
==> boot:         "master_quorum":"1",
==> boot:         "mesos_container_logger":"org_apache_mesos_LogrotateContainerLogger",
==> boot:         "mesos_dns_ip_sources":"[\"host\", \"netinfo\"]",
==> boot:         "mesos_dns_resolvers_str":"\"resolvers\": [\"10.0.2.3\"]",
==> boot:         "mesos_hooks":"",
==> boot:         "mesos_isolation":"cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,docker/volume",
==> boot:         "mesos_log_directory_max_files":"162",
==> boot:         "mesos_log_retention_count":"137",
==> boot:         "mesos_log_retention_mb":"4000",
==> boot:         "minuteman_forward_metrics":"false",
==> boot:         "minuteman_max_named_ip":"11.255.255.255",
==> boot:         "minuteman_max_named_ip_erltuple":"{11,255,255,255}",
==> boot:         "minuteman_min_named_ip":"11.0.0.0",
==> boot:         "minuteman_min_named_ip_erltuple":"{11,0,0,0}",
==> boot:         "num_masters":"1",
==> boot:         "oauth_auth_host":"https://dcos.auth0.com",
==> boot:         "oauth_auth_redirector":"https://auth.dcos.io",
==> boot:         "oauth_available":"true",
==> boot:         "oauth_client_id":"3yF5TOSzdlI45Q1xspxzeoGBe9fNxm9m",
==> boot:         "oauth_enabled":"true",
==> boot:         "oauth_issuer_url":"https://dcos.auth0.com/",
==> boot:         "package_names":"[\n  \"dcos-config\",\n  \"dcos-metadata\"\n]",
==> boot:         "provider":"onprem",
==> boot:         "resolvers":"[\"10.0.2.3\"]",
==> boot:         "resolvers_str":"10.0.2.3",
==> boot:         "rexray_config":"{\"rexray\": {\"modules\": {\"default-docker\": {\"disabled\": true}, \"default-admin\": {\"host\": \"tcp://127.0.0.1:61003\"}}, \"loglevel\": \"info\"}}",
==> boot:         "rexray_config_contents":"\"rexray:\\n  loglevel: info\\n  modules:\\n    default-admin:\\n      host: tcp://127.0.0.1:61003\\n\\\n  \\    default-docker:\\n      disabled: true\\n\"\n",
==> boot:         "rexray_config_preset":"",
==> boot:         "telemetry_enabled":"true",
==> boot:         "template_filenames":"[\n  \"dcos-config.yaml\",\n  \"cloud-config.yaml\",\n  \"dcos-metadata.yaml\",\n  \"dcos-services.yaml\"\n]",
==> boot:         "ui_banner":"false",
==> boot:         "ui_banner_background_color":"#1E232F",
==> boot:         "ui_banner_dismissible":"null",
==> boot:         "ui_banner_footer_content":"null",
==> boot:         "ui_banner_foreground_color":"#FFFFFF",
==> boot:         "ui_banner_header_content":"null",
==> boot:         "ui_banner_header_title":"null",
==> boot:         "ui_banner_image_path":"null",
==> boot:         "ui_branding":"false",
==> boot:         "ui_external_links":"false",
==> boot:         "use_mesos_hooks":"false",
==> boot:         "use_proxy":"false",
==> boot:         "user_arguments":"{\n  \"agent_list\":\"[\\\"192.168.65.111\\\", \\\"192.168.65.60\\\"]\",\n  \"bootstrap_url\":\"http://192.168.65.50\",\n  \"check_time\":\"false\",\n  \"cluster_name\":\"dcos-vagrant\",\n  \"exhibitor_storage_backend\":\"static\",\n  \"exhibitor_zk_hosts\":\"192.168.65.50:2181\",\n  \"master_discovery\":\"static\",\n  \"master_list\":\"[\\\"192.168.65.90\\\"]\",\n  \"provider\":\"onprem\",\n  \"resolvers\":\"[\\\"10.0.2.3\\\"]\",\n  \"ssh_port\":\"22\",\n  \"ssh_user\":\"vagrant\",\n  \"superuser_password_hash\":\"$6$rounds=656000$123o/Qz.InhbkbsO$kn5IkpWm5CplEorQo7jG/27LkyDgWrml36lLxDtckZkCxu22uihAJ4DOJVVnNbsz/Y5MCK3B1InquE6E7Jmh30\",\n  \"superuser_username\":\"admin\"\n}",
==> boot:         "weights":""
==> boot:       }
==> boot:       Package filename: packages/dcos-config/dcos-config--setup_4869fa95533aed5aad36093272289e6bd389b458.tar.xz
==> boot:       Package filename: packages/dcos-metadata/dcos-metadata--setup_4869fa95533aed5aad36093272289e6bd389b458.tar.xz
==> boot:       Generating Bash configuration files for DC/OS
==> boot:       ‘/root/dcos/genconf/serve/bootstrap’ -> ‘/var/tmp/dcos/bootstrap’
==> boot:       ‘/root/dcos/genconf/serve/bootstrap/5df43052907c021eeb5de145419a3da1898c58a5.bootstrap.tar.xz’ -> ‘/var/tmp/dcos/bootstrap/5df43052907c021eeb5de145419a3da1898c58a5.bootstrap.tar.xz’
==> boot:       ‘/root/dcos/genconf/serve/bootstrap/5df43052907c021eeb5de145419a3da1898c58a5.active.json’ -> ‘/var/tmp/dcos/bootstrap/5df43052907c021eeb5de145419a3da1898c58a5.active.json’
==> boot:       ‘/root/dcos/genconf/serve/bootstrap.latest’ -> ‘/var/tmp/dcos/bootstrap.latest’
==> boot:       ‘/root/dcos/genconf/serve/cluster-package-info.json’ -> ‘/var/tmp/dcos/cluster-package-info.json’
==> boot:       ‘/root/dcos/genconf/serve/dcos_install.sh’ -> ‘/var/tmp/dcos/dcos_install.sh’
==> boot:       ‘/root/dcos/genconf/serve/packages’ -> ‘/var/tmp/dcos/packages’
==> boot:       ‘/root/dcos/genconf/serve/packages/dcos-metadata’ -> ‘/var/tmp/dcos/packages/dcos-metadata’
==> boot:       ‘/root/dcos/genconf/serve/packages/dcos-metadata/dcos-metadata--setup_4869fa95533aed5aad36093272289e6bd389b458.tar.xz’ -> ‘/var/tmp/dcos/packages/dcos-metadata/dcos-metadata--setup_4869fa95533aed5aad36093272289e6bd389b458.tar.xz’
==> boot:       ‘/root/dcos/genconf/serve/packages/dcos-config’ -> ‘/var/tmp/dcos/packages/dcos-config’
==> boot:       ‘/root/dcos/genconf/serve/packages/dcos-config/dcos-config--setup_4869fa95533aed5aad36093272289e6bd389b458.tar.xz’ -> ‘/var/tmp/dcos/packages/dcos-config/dcos-config--setup_4869fa95533aed5aad36093272289e6bd389b458.tar.xz’
==> m1: Installing DC/OS (master)
==> m1: sudo: bash -ceu "curl --fail --location --silent --show-error --verbose http://boot.dcos/dcos_install.sh | bash -s -- master"
==> m1:
==> m1:       * About to connect() to boot.dcos port 80 (#0)
==> m1:       *   Trying 192.168.65.50...
==> m1:       * Connected to boot.dcos (192.168.65.50) port 80 (#0)
==> m1:       > GET /dcos_install.sh HTTP/1.1
==> m1:       > User-Agent: curl/7.29.0
==> m1:       > Host: boot.dcos
==> m1:       > Accept: */*
==> m1:       >
==> m1:       < HTTP/1.1 200 OK ==> m1:       < Server: nginx/1.11.4 ==> m1:       < Date: Tue, 07 Mar 2017 22:46:20 GMT ==> m1:       < Content-Type: application/octet-stream ==> m1:       < Content-Length: 15293 ==> m1:       < Last-Modified: Tue, 07 Mar 2017 22:46:11 GMT ==> m1:       < Connection: keep-alive ==> m1:       < ETag: "58bf3833-3bbd" ==> m1:       < Accept-Ranges: bytes ==> m1:       < ==> m1:       { [data not shown]
==> m1:       * Connection #0 to host boot.dcos left intact
==> m1:       Starting DC/OS Install Process
==> m1:       Running preflight checks
==> m1:       Checking if DC/OS is already installed:
==> m1:       PASS (Not installed)
==> m1:       PASS Is SELinux disabled?
==> m1:       Checking if docker is installed and in PATH:
==> m1:       PASS
==> m1:       Checking docker version requirement (>= 1.6):
==> m1:       PASS (1.11.2)
==> m1:       Checking if curl is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if bash is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if ping is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if tar is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if xz is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if unzip is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if ipset is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if systemd-notify is installed and in PATH:
==> m1:       PASS
==> m1:       Checking if systemd is installed and in PATH:
==> m1:       PASS
==> m1:       Checking systemd version requirement (>= 200):
==> m1:       PASS (219)
==> m1:       Checking if group 'nogroup' exists:
==> m1:       PASS
==> m1:       Checking if port 53 (required by spartan) is in use:
==> m1:       PASS
==> m1:       Checking if port 80 (required by adminrouter) is in use:
==> m1:       PASS
==> m1:       Checking if port 443 (required by adminrouter) is in use:
==> m1:       PASS
==> m1:       Checking if port 1050 (required by 3dt) is in use:
==> m1:       PASS
==> m1:       Checking if port 2181 (required by zookeeper) is in use:
==> m1:       PASS
==> m1:       Checking if port 5050 (required by mesos-master) is in use:
==> m1:       PASS
==> m1:       Checking if port 7070 (required by cosmos) is in use:
==> m1:       PASS
==> m1:       Checking if port 8080 (required by marathon) is in use:
==> m1:       PASS
==> m1:       Checking if port 8101 (required by dcos-oauth) is in use:
==> m1:       PASS
==> m1:       Checking if port 8123 (required by mesos-dns) is in use:
==> m1:       PASS
==> m1:       Checking if port 8181 (required by exhibitor) is in use:
==> m1:       PASS
==> m1:       Checking if port 9000 (required by metronome) is in use:
==> m1:       PASS
==> m1:       Checking if port 9942 (required by metronome) is in use:
==> m1:       PASS
==> m1:       Checking if port 9990 (required by cosmos) is in use:
==> m1:       PASS
==> m1:       Checking if port 15055 (required by dcos-history) is in use:
==> m1:       PASS
==> m1:       Checking if port 33107 (required by navstar) is in use:
==> m1:       PASS
==> m1:       Checking if port 36771 (required by marathon) is in use:
==> m1:       PASS
==> m1:       Checking if port 41281 (required by zookeeper) is in use:
==> m1:       PASS
==> m1:       Checking if port 42819 (required by spartan) is in use:
==> m1:       PASS
==> m1:       Checking if port 43911 (required by minuteman) is in use:
==> m1:       PASS
==> m1:       Checking if port 46839 (required by metronome) is in use:
==> m1:       PASS
==> m1:       Checking if port 61053 (required by mesos-dns) is in use:
==> m1:       PASS
==> m1:       Checking if port 61420 (required by epmd) is in use:
==> m1:       PASS
==> m1:       Checking if port 61421 (required by minuteman) is in use:
==> m1:       PASS
==> m1:       Checking if port 62053 (required by spartan) is in use:
==> m1:       PASS
==> m1:       Checking if port 62080 (required by navstar) is in use:
==> m1:       PASS
==> m1:       Checking Docker is configured with a production storage driver:
==> m1:       WARNING: bridge-nf-call-iptables is disabled
==> m1:       WARNING: bridge-nf-call-ip6tables is disabled
==> m1:       PASS (overlay)
==> m1:       Creating directories under /etc/mesosphere
==> m1:       Creating role file for master
==> m1:       Configuring DC/OS
==> m1:       Setting and starting DC/OS
==> m1:       Created symlink from /etc/systemd/system/multi-user.target.wants/dcos-setup.service to /etc/systemd/system/dcos-setup.service.
==> a1: Installing DC/OS (agent)
==> p1: Installing DC/OS (agent-public)
==> a1: sudo: bash -ceu "curl --fail --location --silent --show-error --verbose http://boot.dcos/dcos_install.sh | bash -s -- slave"
==> p1: sudo: bash -ceu "curl --fail --location --silent --show-error --verbose http://boot.dcos/dcos_install.sh | bash -s -- slave_public"
==> a1:
==> p1:
==> a1:       * About to connect() to boot.dcos port 80 (#0)
==> p1:       * About to connect() to boot.dcos port 80 (#0)
==> a1:       *   Trying 192.168.65.50...
==> p1:       *   Trying 192.168.65.50...
==> a1:       * Connected to boot.dcos (192.168.65.50) port 80 (#0)
==> p1:       * Connected to boot.dcos (192.168.65.50) port 80 (#0)
==> p1:       > GET /dcos_install.sh HTTP/1.1
==> p1:       > User-Agent: curl/7.29.0
==> p1:       > Host: boot.dcos
==> p1:       > Accept: */*
==> p1:       >
==> a1:       > GET /dcos_install.sh HTTP/1.1
==> a1:       > User-Agent: curl/7.29.0
==> a1:       > Host: boot.dcos
==> a1:       > Accept: */*
==> a1:       >
==> p1:       < HTTP/1.1 200 OK ==> p1:       < Server: nginx/1.11.4 ==> p1:       < Date: Tue, 07 Mar 2017 22:48:31 GMT ==> p1:       < Content-Type: application/octet-stream ==> p1:       < Content-Length: 15293 ==> p1:       < Last-Modified: Tue, 07 Mar 2017 22:46:11 GMT ==> p1:       < Connection: keep-alive ==> p1:       < ETag: "58bf3833-3bbd" ==> p1:       < Accept-Ranges: bytes ==> p1:       < ==> p1:       { [data not shown]
==> a1:       < HTTP/1.1 200 OK ==> a1:       < Server: nginx/1.11.4 ==> a1:       < Date: Tue, 07 Mar 2017 22:48:31 GMT ==> a1:       < Content-Type: application/octet-stream ==> a1:       < Content-Length: 15293 ==> a1:       < Last-Modified: Tue, 07 Mar 2017 22:46:11 GMT ==> a1:       < Connection: keep-alive ==> a1:       < ETag: "58bf3833-3bbd" ==> a1:       < Accept-Ranges: bytes ==> a1:       < ==> a1:       { [data not shown]
==> p1:       * Connection #0 to host boot.dcos left intact
==> a1:       * Connection #0 to host boot.dcos left intact
==> p1:       Starting DC/OS Install Process
==> p1:       Running preflight checks
==> p1:       Checking if DC/OS is already installed: PASS (Not installed)
==> a1:       Starting DC/OS Install Process
==> a1:       Running preflight checks
==> a1:       Checking if DC/OS is already installed: PASS (Not installed)
==> a1:       PASS Is SELinux disabled?
==> p1:       PASS Is SELinux disabled?
==> p1:       Checking if docker is installed and in PATH:
==> p1:       PASS
==> p1:       Checking docker version requirement (>= 1.6):
==> p1:       PASS (1.11.2)
==> p1:       Checking if curl is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if bash is installed and in PATH:
==> a1:       Checking if docker is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if ping is installed and in PATH:
==> a1:       PASS
==> a1:       Checking docker version requirement (>= 1.6):
==> p1:       PASS
==> p1:       Checking if tar is installed and in PATH:
==> a1:       PASS (1.11.2)
==> p1:       PASS
==> a1:       Checking if curl is installed and in PATH:
==> p1:       Checking if xz is installed and in PATH:
==> a1:       PASS
==> p1:       PASS
==> p1:       Checking if unzip is installed and in PATH:
==> a1:       Checking if bash is installed and in PATH:
==> a1:       PASS
==> p1:       PASS
==> p1:       Checking if ipset is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if systemd-notify is installed and in PATH:
==> a1:       Checking if ping is installed and in PATH:
==> p1:       PASS
==> a1:       PASS
==> a1:       Checking if tar is installed and in PATH:
==> p1:       Checking if systemd is installed and in PATH:
==> a1:       PASS
==> p1:       PASS
==> p1:       Checking systemd version requirement (>= 200):
==> a1:       Checking if xz is installed and in PATH:
==> p1:       PASS (219)
==> p1:       Checking if group 'nogroup' exists:
==> p1:       PASS
==> p1:       Checking if port 53 (required by spartan) is in use:
==> a1:       PASS
==> a1:       Checking if unzip is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if port 5051 (required by mesos-agent) is in use:
==> a1:       PASS
==> p1:       PASS
==> p1:       Checking if port 34451 (required by navstar) is in use:
==> a1:       Checking if ipset is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if port 39851 (required by spartan) is in use:
==> a1:       PASS
==> p1:       PASS
==> p1:       Checking if port 43995 (required by minuteman) is in use:
==> a1:       Checking if systemd-notify is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if port 61001 (required by agent-adminrouter) is in use:
==> a1:       PASS
==> p1:       PASS
==> p1:       Checking if port 61420 (required by epmd) is in use:
==> a1:       Checking if systemd is installed and in PATH:
==> p1:       PASS
==> p1:       Checking if port 61421 (required by minuteman) is in use:
==> p1:       PASS
==> p1:       Checking if port 62053 (required by spartan) is in use:
==> a1:       PASS
==> a1:       Checking systemd version requirement (>= 200):
==> a1:       PASS (219)
==> a1:       Checking if group 'nogroup' exists:
==> p1:       PASS
==> p1:       Checking if port 62080 (required by navstar) is in use:
==> a1:       PASS
==> a1:       Checking if port 53 (required by spartan) is in use:
==> p1:       PASS
==> p1:       Checking Docker is configured with a production storage driver:
==> a1:       PASS
==> a1:       Checking if port 5051 (required by mesos-agent) is in use:
==> p1:       WARNING: bridge-nf-call-iptables is disabled
==> p1:       WARNING: bridge-nf-call-ip6tables is disabled
==> a1:       PASS
==> a1:       Checking if port 34451 (required by navstar) is in use:
==> p1:       PASS (overlay)
==> p1:       Creating directories under /etc/mesosphere
==> a1:       PASS
==> a1:       Checking if port 39851 (required by spartan) is in use:
==> p1:       Creating role file for slave_public
==> a1:       PASS
==> a1:       Checking if port 43995 (required by minuteman) is in use:
==> p1:       Configuring DC/OS
==> a1:       PASS
==> a1:       Checking if port 61001 (required by agent-adminrouter) is in use:
==> a1:       PASS
==> a1:       Checking if port 61420 (required by epmd) is in use:
==> a1:       PASS
==> a1:       Checking if port 61421 (required by minuteman) is in use:
==> a1:       PASS
==> a1:       Checking if port 62053 (required by spartan) is in use:
==> a1:       PASS
==> a1:       Checking if port 62080 (required by navstar) is in use:
==> a1:       PASS
==> a1:       Checking Docker is configured with a production storage driver:
==> p1:       Setting and starting DC/OS
==> a1:       WARNING: bridge-nf-call-iptables is disabled
==> a1:       WARNING: bridge-nf-call-ip6tables is disabled
==> a1:       PASS (overlay)
==> a1:       Creating directories under /etc/mesosphere
==> a1:       Creating role file for slave
==> a1:       Configuring DC/OS
==> a1:       Setting and starting DC/OS
==> a1:       Created symlink from /etc/systemd/system/multi-user.target.wants/dcos-setup.service to /etc/systemd/system/dcos-setup.service.
==> p1:       Created symlink from /etc/systemd/system/multi-user.target.wants/dcos-setup.service to /etc/systemd/system/dcos-setup.service.
==> m1: DC/OS Postflight
==> a1: DC/OS Postflight
==> p1: DC/OS Postflight
==> m1: sudo: dcos-postflight
==> a1: sudo: dcos-postflight
==> p1: sudo: dcos-postflight
==> a1:
==> p1:
==> m1:
==> a1: Setting Mesos Memory: 5632 (role=*)
==> a1: sudo: mesos-memory 5632
==> a1:
==> a1:       Updating /var/lib/dcos/mesos-resources
==> a1: Restarting Mesos Agent
==> a1: sudo: bash -ceu "systemctl stop dcos-mesos-slave.service && rm -f /var/lib/mesos/slave/meta/slaves/latest && systemctl start dcos-mesos-slave.service --no-block"
==> a1:
==> p1: Setting Mesos Memory: 1024 (role=slave_public)
==> p1: sudo: mesos-memory 1024 slave_public
==> p1:
==> p1:       Updating /var/lib/dcos/mesos-resources
==> p1: Restarting Mesos Agent
==> p1: sudo: bash -ceu "systemctl stop dcos-mesos-slave-public.service && rm -f /var/lib/mesos/slave/meta/slaves/latest && systemctl start dcos-mesos-slave-public.service --no-block"
==> p1:
==> boot: DC/OS Installation Complete
==> boot: Web Interface: http://m1.dcos/
==> boot: DC/OS Installation Complete
==> boot: Web Interface: http://m1.dcos/

The VirtualBox GUI shows the four machines we had seen in the VagrantConfig.yaml. They are up and running:

Step 6: Log into the DC/OS GUI

Now let us access the Web UI on m1.dcos:

The Vagrant Hostmanager Plugin works also on Windows: we can check this by reading the hosts file on C:\Windows\System32\drivers\etc\hosts. It contains the DNS mappings for the four machines (a1.dcos, boot.dcos, m1.dcos and p1.dcos). The DNS mapping for spring.acme.org with alias oinker.acme.org will be missing in your case and will be added at a later step, when we are installing the Marathon load balancer based on HAProxy.

The host manager has added m1 and some other FQDNs to the hosts file (found on C:\Windows\System32\drivers\etc\hosts):

## vagrant-hostmanager-start id: 9f1502eb-71bf-4e6a-b3bc-44a83db628b7
192.168.65.111 a1.dcos

192.168.65.50 boot.dcos

192.168.65.90 m1.dcos

192.168.65.60 p1.dcos
192.168.65.60 spring.acme.org oinker.acme.org
## vagrant-hostmanager-end

After login in via Google,

and pressing the Allow button, we reach at the DC/OS Dashboard:

(scrolling down)

Step 7: Install the DCOS CLI

Now we will continue to follow the DC/OS 101 Tutorial and install the DC/OS CLI. This can be done by clicking the profile on the lower left of the Web GUI:

-> 

-> 

-> 

Choose the operating system type you are working on. In my case, I have a Windows system and I have performed following steps:

Step 8: Configure DC/OS Master URL

First, we cd into the the  folder, where dcos.exe is located (D:\veits\downloads\DCOS CLI in my case), before we configure the core DCOS URL:

Windows> cd /D "D:\veits\downloads\DCOS CLI"
Windows> dcos config set core.dcos_url http://m1.dcos
Windows> dcos
Command line utility for the Mesosphere Datacenter Operating
System (DC/OS). The Mesosphere DC/OS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DC/OS installation.

Available DC/OS commands:

        auth            Authenticate to DC/OS cluster
        config          Manage the DC/OS configuration file
        experimental    Experimental commands. These commands are under development and are subject to change
        help            Display help information about DC/OS
        job             Deploy and manage jobs in DC/OS
        marathon        Deploy and manage applications to DC/OS
        node            Administer and manage DC/OS cluster nodes
        package         Install and manage DC/OS software packages
        service         Manage DC/OS services
        task            Manage DC/OS tasks

Get detailed command description with 'dcos  --help'.

Step 9: Receive Token from the DC/OS Master

Windows> dcos auth login

Please go to the following link in your browser:

    http://m1.dcos/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob
Enter OpenID Connect ID Token:eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIm...-YqOARGFN5Ewcf6YWlw <-------(shortened)
Login successful! 

Here, I have cut&paste the link I have marked in red into the browser URL field:

Then logged in as Google user:

-> 

-> I have signed in with Google

-> 

-> clicked Copy to Clipboard

-> paste the clipboard to the terminal as shown above already (here again) and press <enter>:

Enter OpenID Connect ID Token:eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIm...-YqOARGFN5Ewcf6YWlw <-------(shortened)
Login successful!

With that, you make sure only you have access to the (virtual) cluster.

Step 10 (optional): Explore DC/OS and Marathon

With the dcos service command, we will see, that Marathon is running already:

Windows> dcos service
NAME           HOST      ACTIVE  TASKS  CPU  MEM  DISK  ID
marathon  192.168.65.90   True     0    0.0  0.0  0.0   1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-0001

With dcos node we see that two (virtual) nodes are connected (as we might have noticed on the dashboard as well):

Windows> dcos node
   HOSTNAME           IP                           ID
192.168.65.111  192.168.65.111  1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S2
192.168.65.60   192.168.65.60   1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S3

The first one is a1, the private agent, and the second one is p1, the public agent.

With dcos log --leader we can check the Mesos master log:

Windows> dcos node log --leader
dcos-log is not supported
Falling back to files API...
I0309 13:11:45.152153  3217 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45654 with User-Agent='python-requests/2.10.0'
I0309 13:11:47.176911  3214 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45660 with User-Agent='python-requests/2.10.0'
I0309 13:11:48.039836  3214 http.cpp:390] HTTP GET for /master/state from 192.168.65.90:41141 with User-Agent='Mesos-State / Host: m1, Pid: 5258'
I0309 13:11:49.195853  3216 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45666 with User-Agent='python-requests/2.10.0'
I0309 13:11:51.216013  3217 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45672 with User-Agent='python-requests/2.10.0'
I0309 13:11:51.376802  3217 master.cpp:5478] Performing explicit task state reconciliation for 1 tasks of framework 1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-0001 (marathon) at scheduler-1a712a58-a49a-4c45-a89a-823b827a49bf@192.168.65.90:15101
I0309 13:11:53.236994  3217 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45678 with User-Agent='python-requests/2.10.0'
I0309 13:11:55.257347  3216 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45684 with User-Agent='python-requests/2.10.0'
I0309 13:11:57.274785  3217 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:45690 with User-Agent='python-requests/2.10.0'
I0309 13:11:57.462590  3213 http.cpp:390] HTTP GET for /master/state.json from 192.168.65.90:45704 with User-Agent='Mesos-DNS'

Finally, dcos help shows the output

Windows> dcos help
Description:
    The Mesosphere Datacenter Operating System (DC/OS) spans all of the machines in
your datacenter or cloud and treats them as a single, shared set of resources.

Usage:
    dcos [options] [] [...]

Options:
    --debug
        Enable debug mode.
    --help
        Print usage.
    --log-level=
        Set the logging level. This setting does not affect the output sent to
        stdout. The severity levels are:
        The severity level:
        * debug    Prints all messages.
        * info     Prints informational, warning, error, and critical messages.
        * warning  Prints warning, error, and critical messages.
        * error    Prints error and critical messages.
        * critical Prints only critical messages to stderr.
    --version
        Print version information

Environment Variables:
    DCOS_CONFIG
        Set the path to the DC/OS configuration file. By default, this variable
        is set to ~/.dcos/dcos.toml.
    DCOS_DEBUG
        Indicates whether to print additional debug messages to stdout. By
        default this is set to false.
    DCOS_LOG_LEVEL
        Prints log messages to stderr at or above the level indicated. This is
        equivalent to the --log-level command-line option.

You can also check the CLI documentation.

Step 11: Deploy a Hello World Service per GUI

If you follow steps 11 and 12, you will see in step 13 that the default networking settings are sub-optimal. You can skip steps 11 to 14, if you wish to create a hello service with an improved networking including load balancing.

Now we will create a Hello World Service. For that, log into the DC/OS, if not done already and navigate to Services:

-> 

-> 

Here we have chosen only 0.1 CPU, since Mesos is quite strict on the resource reservations: the sum of CPUs reserved for the applications cannot exceed the number you have at hand, even if the application does not need the resources really. This is, what we have seen in my previous Mesos blog post, where we have deployed hello world applications that only printed out a “Hello World” once a second with a reservation of one CPU. With two CPUs available, I could not start more than two such hello world applications.

Let us deploy a container from the image nginxdemos/hello:

-> 

-> 

Now the Service is getting deployed:

Step 12: Connect to the NginX Service

When we click on the nginx-via-gui service name, we will see that the service is running on the private Mesos agent a1 on 192.168.65.111:

We can directly access the service by entering the private agent’s IP address 192.168.65.111  or name a1.dcos in the Browser’s URL field:

Here we can see that we have a quite simple networking model: the Windows host uses IP address 192.168.65.1 to reach the server on 192.168.65.111, which is the private Mesos agent’s IP address. The NginX container is just sharing the private agent’s network interface.

Because of the simple networking model, that was easier than expected:

  1. in other situations, you often need to configure port forwarding on VirtualBox VM, but not this time: the Mesos Agent is configured with a secondary Ethernet interface with host networking, which allows to connect from the VirtualBox host to any port of the private agent without VirtualBox port forwarding.
  2. in other situations, you often need to configure a port mapping between the docker container and the docker host (the Mesos agent in this case) is needed. Why not this time? Let us explore this in more detail in the next optional step.

Step 13 (optional): Explore the Default Mesos Networking

While deploying the service, we have not reviewed the network tab yet. However, we can do this now by clicking on the service, then “Edit” and then “Network”:

The default network setting is the “Host” networking, which means that the container is sharing the host’s network interface directly. The image, we have chosen is exposing port 80. This is, why we can reach the service by entering the host’s name or IP address with port 80 to the URL field of the browser.

Since the container is re-using the Docker host’s network interface, a port mapping is not needed, as we can confirm with a docker ps command:

(Vagranthost)$ vagrant ssh a1
...
(a1)$ docker ps
CONTAINER ID        IMAGE                         COMMAND             CREATED             STATUS              PORTS               NAMES
cd5a068aaa28        oveits/docker-nginx-busybox   "/usr/sbin/nginx"   39 minutes ago      Up 39 minutes                           mesos-1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S2.39067bbf-c4b6-448b-9eb9-975c050bcf57

Here we cannot see any port mapping here (the PORTS field is empty).

Note that the default network configuration does not allow to scale the service: port 80 is already occupied.

 

Let us confirm this assumption by trying to scale the NginX service to two containers:

On Services -> Drop-down list right of name -> Scale

->choose 2 instances:

-> 

Now the service continually tries to start the second container and the status is toggling between Waiting, Running and Delayed:

As expected, the second docker container cannot start, because port 80 is already occupied on the docker host. The error log shows:

I0324 11:23:01.820436 7765 exec.cpp:161] Version: 1.0.3
I0324 11:23:01.825763 7769 exec.cpp:236] Executor registered on agent 1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S2
I0324 11:23:01.827263 7772 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2017-03-24T18:18:00.202Z -e HOST=192.168.65.111 -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=oveits/docker-nginx-busybox -e PORT_10000=10298 -e MESOS_TASK_ID=nginx.ea26c7af-10be-11e7-9134-70b3d5800001 -e PORT=10298 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=10298 -e MARATHON_APP_RESOURCE_DISK=2.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/nginx -e PORT0=10298 -e LIBPROCESS_IP=192.168.65.111 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S2.f752b208-f7d1-49d6-8cdd-cbb62eaf4768 -v /var/lib/mesos/slave/slaves/1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S2/frameworks/1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-0001/executors/nginx.ea26c7af-10be-11e7-9134-70b3d5800001/runs/f752b208-f7d1-49d6-8cdd-cbb62eaf4768:/mnt/mesos/sandbox --net host --name mesos-1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-S2.f752b208-f7d1-49d6-8cdd-cbb62eaf4768 oveits/docker-nginx-busybox
nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (2: No such file or directory)
2017/03/24 18:23:01 [emerg] 1#0: bind() to 0.0.0.0:80 failed (98: Address in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address in use)
2017/03/24 18:23:01 [notice] 1#0: try again to bind() after 500ms
2017/03/24 18:23:01 [emerg] 1#0: bind() to 0.0.0.0:80 failed (98: Address in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address in use)
2017/03/24 18:23:01 [notice] 1#0: try again to bind() after 500ms
2017/03/24 18:23:01 [emerg] 1#0: bind() to 0.0.0.0:80 failed (98: Address in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address in use)
2017/03/24 18:23:01 [notice] 1#0: try again to bind() after 500ms
2017/03/24 18:23:01 [emerg] 1#0: bind() to 0.0.0.0:80 failed (98: Address in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address in use)
2017/03/24 18:23:01 [notice] 1#0: try again to bind() after 500ms
2017/03/24 18:23:01 [emerg] 1#0: bind() to 0.0.0.0:80 failed (98: Address in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address in use)
2017/03/24 18:23:01 [notice] 1#0: try again to bind() after 500ms
2017/03/24 18:23:01 [emerg] 1#0: still could not bind()
nginx: [emerg] still could not bind()

This is not a good configuration. Can we choose a different type of networking at the time we start the service? Let us follow Step 14 to create the same service, but now in a scalable and load-balanced fashion:

Step 14: Deploy a Hello World Service per JSON with improved Networking and Load-Balancing

Step 14.1: Install Marathon Load Balancer

Step 14.1.1: Check, if Marathon LB is already installed

In the moment, the Marathon Load Balancer is not installed. This can be checked with following DCOS CLI command:

(DCOS CLI Client)$ dcos package list
There are currently no installed packages. Please use `dcos package install` to install a package.

Step 14.1.2 (optional): Check Options of Marathon Package

Let us install the Marathon Load balancer by following the version 1.8 documentation. First, we will have a look to the package (optional):

(DCOS CLI Client)$ dcos package describe --config marathon-lb
{
  "$schema": "http://json-schema.org/schema#",
  "properties": {
    "marathon-lb": {
      "properties": {
        "auto-assign-service-ports": {
          "default": false,
          "description": "Auto assign service ports for tasks which use IP-per-task. See https://github.com/mesosphere/marathon-lb#mesos-with-ip-per-task-support for details.",
          "type": "boolean"
        },
        "bind-http-https": {
          "default": true,
          "description": "Reserve ports 80 and 443 for the LB. Use this if you intend to use virtual hosts.",
          "type": "boolean"
        },
        "cpus": {
          "default": 2,
          "description": "CPU shares to allocate to each marathon-lb instance.",
          "minimum": 1,
          "type": "number"
        },
        "haproxy-group": {
          "default": "external",
          "description": "HAProxy group parameter. Matches with HAPROXY_GROUP in the app labels.",
          "type": "string"
        },
        "haproxy-map": {
          "default": true,
          "description": "Enable HAProxy VHost maps for fast VHost routing.",
          "type": "boolean"
        },
        "haproxy_global_default_options": {
          "default": "redispatch,http-server-close,dontlognull",
          "description": "Default global options for HAProxy.",
          "type": "string"
        },
        "instances": {
          "default": 1,
          "description": "Number of instances to run.",
          "minimum": 1,
          "type": "integer"
        },
        "marathon-uri": {
          "default": "http://marathon.mesos:8080",
          "description": "URI of Marathon instance",
          "type": "string"
        },
        "maximumOverCapacity": {
          "default": 0.2,
          "description": "Maximum over capacity.",
          "minimum": 0,
          "type": "number"
        },
        "mem": {
          "default": 1024.0,
          "description": "Memory (MB) to allocate to each marathon-lb task.",
          "minimum": 256.0,
          "type": "number"
        },
        "minimumHealthCapacity": {
          "default": 0.5,
          "description": "Minimum health capacity.",
          "minimum": 0,
          "type": "number"
        },
        "name": {
          "default": "marathon-lb",
          "description": "Name for this LB instance",
          "type": "string"
        },
        "role": {
          "default": "slave_public",
          "description": "Deploy marathon-lb only on nodes with this role.",
          "type": "string"
        },
        "secret_name": {
          "default": "",
          "description": "Name of the Secret Store credentials to use for DC/OS service authentication. This should be left empty unless service authentication is needed.",
          "type": "string"
        },
        "ssl-cert": {
          "description": "TLS Cert and private key for HTTPS.",
          "type": "string"
        },
        "strict-mode": {
          "default": false,
          "description": "Enable strict mode. This requires that you explicitly enable each backend with `HAPROXY_{n}_ENABLED=true`.",
          "type": "boolean"
        },
        "sysctl-params": {
          "default": "net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_fin_timeout=30 net.ipv4.tcp_max_syn_backlog=10240 net.ipv4.tcp_max_tw_buckets=400000 net.ipv4.tcp_max_orphans=60000 net.core.somaxconn=10000",
          "description": "sysctl params to set at startup for HAProxy.",
          "type": "string"
        },
        "template-url": {
          "default": "",
          "description": "URL to tarball containing a directory templates/ to customize haproxy config.",
          "type": "string"
        }
      },
      "required": [
        "cpus",
        "mem",
        "haproxy-group",
        "instances",
        "name"
      ],
      "type": "object"
    }
  },
  "type": "object"
}

Step 14.1.3: Install and Check Marathon Load Balancer

We install the Marathon Package now. We will keep the default configuration:

$ dcos package install marathon-lb
We recommend at least 2 CPUs and 1GiB of RAM for each Marathon-LB instance.

*NOTE*: ```Enterprise Edition``` DC/OS requires setting up the Service Account in all security modes.
Follow these instructions to setup a Service Account: https://docs.mesosphere.com/administration/id-and-access-mgt/service-auth/mlb-auth/
Continue installing? [yes/no] yes
Installing Marathon app for package [marathon-lb] version [1.5.1]
Marathon-lb DC/OS Service has been successfully installed!
See https://github.com/mesosphere/marathon-lb for documentation.

Now let uch check that the package is installed:

$ dcos package list
NAME VERSION APP COMMAND DESCRIPTION
marathon-lb 1.5.1 /marathon-lb --- HAProxy configured using Marathon state

We are able to see the load balancer service on the GUI as well:

After clicking on marathon-lb service  and the container  and scrolling down (see note), we see, that the load balancer is serving the ports 80, 443, 9090, 9091, and 10000 to 10100. We will use one of the high ports soon.

 

Note: scrolling is a little bit tricky at the moment, you might need to re-size the browser view with ctrl minus or ctrl plus to see the scroll bar on the right. Another possibility is to click into the black part of the browser page and use the arrow keys thereafter

Port 9090 is used by the load balancer admin interface. We can see the statistics there:

Step 14.2: Create an Application using Marathon Load Balancer

Now let us follow this instructions to add a service that makes use of the Marathon Load Balancer:

Step 14.2.1: Define the Application’s Configuration File

Save following File content as nginx-hostname-app.json:

{
   "id": "nginx-hostname",
   "container": {
     "type": "DOCKER",
     "docker": {
       "image": "nginxdemos/hello",
       "network": "BRIDGE",
       "portMappings": [
         { "hostPort": 0, "containerPort": 80, "servicePort": 10006 }
       ]
     }
   },
   "instances": 3,
   "cpus": 0.25,
   "mem": 100,
   "healthChecks": [{
       "protocol": "HTTP",
       "path": "/",
       "portIndex": 0,
       "timeoutSeconds": 2,
       "gracePeriodSeconds": 15,
       "intervalSeconds": 3,
       "maxConsecutiveFailures": 2
   }],
   "labels":{
     "HAPROXY_DEPLOYMENT_GROUP":"nginx-hostname",
     "HAPROXY_DEPLOYMENT_ALT_PORT":"10007",
     "HAPROXY_GROUP":"external",
     "HAPROXY_0_REDIRECT_TO_HTTPS":"true",
     "HAPROXY_0_VHOST": "192.168.65.111"
   }
}

If you are running in another environment than the one we have created using Vagrant, you might need to adapt the IP address: replace 192.168.65.111 in the HAPROXY_0_VHOST by your public agent’s IP address.

Step 14.2.2 Create Service using DCOS CLI

Now create the Marathon app using the DCOS CLI (in my case, I have not adapted the Path variable yet, so I had to issue a cd to the full_path_to_dcos.exe, “D:\veits\downloads\DCOS CLI\dcos.exe” in my case.

$ cd <folder_containing_dcos.exe> # needed, if dcos.exe is not in your PATH
$ dcos marathon app add full_path_to_nginx-hostname-app.json
Created deployment 63bac617-792c-488e-8489-80428b1c1e34
$ dcos marathon app list
ID               MEM   CPUS  TASKS  HEALTH  DEPLOYMENT  WAITING  CONTAINER  CMD                                         
/marathon-lb     1024   2     1/1    1/1       ---      False      DOCKER   ['sse', '-m', 'http://marathon.mesos:8080', '--health-check', '--haproxy-map', '--group', 'external']
/nginx-hostname  100   0.25   3/3    3/3       ---      False      DOCKER   None     

On the GUI, under Service we find:

After clicking on the service name nginx-hostname, we see more details on the three healthy containers that have been started:

Now, the service is reachable via curl from within the Mesos netowork (testing on the private agent a1):

(a1)$ curl http://marathon-lb.marathon.mesos:10006

But can we reach it from outside? Yes: marathon-lb.marathon.mesos is mapped to the public agent’s (p1) address 192.168.65.60 and we can reach http://192.168.65.60:10006 from the inside …

(a1)$ curl http://192.168.65.60:10006

…as well as from the outside:

The image we have chosen will return the server name (i.e. the container ID), the server address and port as seen by the server (172.17.0.x with port 80), the called URI (root), the date and the client IP address and port.

When reloading the page via the browser’s reload button, the answering container will change randomly:

This proves that the request are load-balanced between the three NginX containers and can be reached from the Machine hosting the public agent VirtualBox VM. In the next step, we will make sure that the NginX service can be reached from any machine in your local area network.

Step 15: Reaching the Server from the outside World

In case of a physical machine as public agent, the service will be reachable from the local area network (LAN) already. However, in our case, the public agent p1 is a VirtualBox VM using host networks. Since VirtualBox host networks are only reachable from the VirtualBox host, an additional step has to be taken, if the service is to be reachable from outside.

Note that the outside interface of the HAProxy on the DC/OS master hosting the is attached to a VirtualBox host network 192.168.65.0/24. So, if you want to reach the address from the local area network, an additional mapping from an outside interface of the VirtualBox host p1 to port 10006 is needed.

For that, choose

-> VirtualBox GUI

-> p1.dcos

-> Edit

-> Network

Then

-> Adapter1

-> Port Forwarding

-> Add (+)

-> choose a name and map a host port to the port 10006 we have used in the JSON file above:

-> OK

 

In this example you will be able to reach the service via any reachable IP address of the VirtualBox host on port 8081:

With that, the service is reachable from any machine in the local area network.

Appendix A: Virtualbox Installation Problem Resolution

  • On Windows 7 or Windows 10, download the installer. Easy.
  • When I start the installer, everything seems to be on track until I see “rolling back action” and I finally get this:
    “Oracle VM Virtualbox x.x.x Setup Wizard ended prematurely”

Resolution of the “Setup Wizard ended prematurely” Problem

Let us try to resolve the problem: the installer of Virtualbox downloaded from Oracle shows the exact same error: “…ended prematurely”. This is not a docker bug. Playing with conversion tools from Virtualbox to VMware did not lead to the desired results.

The Solution: Google is your friend: the winner is:https://forums.virtualbox.org/viewtopic.php?f=6&t=61785. After backing up the registry and changing the registry entry

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Network -> MaxFilters from 8 to 20 (decimal)

and a reboot of the Laptop, the installation of Virtualbox is successful.

Note: while this workaround has worked on my Windows 7 notebook, it has not worked on my new Windows 10 machine. However, I have managed to install VirtualBox on Windows 10 by de-selecting the USB support module during the VirtualBox installation process. I remember having seen a forum post pointing to that workaround, with the additional information that the USB drivers were installed automatically at the first time a USB device was added to a host (not yet tested on my side).

Appendix B: dcos node log --leader results in “No files exist. Exiting.” Message

Days later, I have tried again:

dcos node log --leader
dcos-log is not supported
Falling back to files API...
No files exist. Exiting.

The reason is that the Token has expired:

Windows> dcos service
Your core.dcos_acs_token is invalid. Please run: `dcos auth login`

The reason is that the Token has expired:

Windows> dcos auth login

Please go to the following link in your browser:

http://m1.dcos/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob

Enter OpenID Connect ID Token:(paste in the key here)
Login successful!

Now we can try again:

Windows> dcos node log --leader
dcos-log is not supported
Falling back to files API...
I0324 09:36:18.030959 4042 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49308 with User-Agent='python-requests/2.10.0'
I0324 09:36:18.285975 4047 master.cpp:5478] Performing explicit task state reconciliation for 1 tasks of framework 1d3a11d0-1c3e-4ec2-8485-d1a3aa43c465-0001 (marathon) at scheduler-908fbaff-5dd6-4089-a417-c10c068f5d85@192.168.65.90:15101
I0324 09:36:20.054447 4047 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49314 with User-Agent='python-requests/2.10.0'
I0324 09:36:22.072386 4044 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49320 with User-Agent='python-requests/2.10.0'
I0324 09:36:22.875411 4041 http.cpp:390] HTTP GET for /master/slaves from 192.168.65.90:49324 with User-Agent='Go-http-client/1.1'
I0324 09:36:24.083292 4041 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49336 with User-Agent='python-requests/2.10.0'
I0324 09:36:26.091071 4047 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49346 with User-Agent='python-requests/2.10.0'
I0324 09:36:28.099954 4047 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49352 with User-Agent='python-requests/2.10.0'
I0324 09:36:29.773558 4047 http.cpp:390] HTTP GET for /master/state.json from 192.168.65.90:49354 with User-Agent='Mesos-DNS'
I0324 09:36:30.116576 4046 http.cpp:390] HTTP GET for /master/state-summary from 192.168.65.90:49360 with User-Agent='python-requests/2.10.0'

Appendix C: Finding the DC/OS Version

Get DC/OS Version (found via this Mesosphere help desk page):

$ curl http://m1/dcos-metadata/dcos-version.json
{
 "version": "1.8.8",
 "dcos-image-commit": "602edc1b4da9364297d166d4857fc8ed7b0b65ca",
 "bootstrap-id": "5df43052907c021eeb5de145419a3da1898c58a5"
}

Appendix D: Error Message, when changing Service Name

If you see the following message when editing a service:

requirement failed: IP address (Some(IpAddress(List(),Map(),DiscoveryInfo(List()),Some(dcos)))) and ports (List(PortDefinition(0,tcp,None,Map()))) are not allowed at the same time

 

Workaround: Destroy and Re-Create Service

Destroy the service and create a new service like follows:

Copy original service in json format (service -> edit -> choose JSON Mode on upper right corner -> ctrl-a ctrl-c -> Cancel)

Create new service

-> Services
-> Deploy Service
-> Edit
-> JSON Mode
-> click into text field
-> ctrl-a ctrl-v
-> edit ID and VIP_0 <– names should be the same: here “nginx-dcos-network-load-balanced-wo-marathon-lb”

-> Deploy

 

Next Steps

  • Explore the multi-tenant capabilities of DC/OS and Mesos/Marathon: can I use the same infrastructure for more than one customer?
    • Separate Logins, customer A should not see resources of customer B
    • Shared resources and separate resource reservations (pool) for the customers
    • Strict resource reservation vs. scheduler based resource reservation
    • Comparison with OpenShift: does OpenShift offer a resource reservation?
  • Running Jenkins on Mesos Marathon of Mesos Job
    • docker socks usage
1

Jenkins Part 4.2: Code Quality Tests via Checkstyle

Today, we will show how to use Checkstyle for improving the style of Java code. First, we will add Checkstyle to Gradle in order to create XML reports for a single build. Jenkins allows us to visualize the results of more than one test/build run into historic reports. After that, we will show, how a developer can use the Eclipse Checkstyle plugin in order to create better code:

This blog post series is divided into following parts:

    • Part 1: Installation and Configuration of Jenkins, loading Plugins
    • Part 2: Creating our first Jenkins job: GitHub download and Software build
    • Part 3: Periodic and automatically triggered Builds
    • Part 4.1: running automated tests: Functional Tests via Java JUnit
    • Part 4.2: running automated tests: Code Quality Test via Checkstyle (this post)
    • Part 4.3: running automated tests: Performance Tests with JMeter (work in progress)

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

 

Jenkins build, test and deployment pipeline

A typical workflow is visualized above: a developer checks in the code changes into the repository. Jenkins will detect the change, build (compile) the software, test it and prepare to deploy it on a system. Depending on the configuration, the deployment is triggered by a human person, or automatically performed by Jenkins.

For more information, see the introduction found in part 1 of this blog series.

Checking Code with Checkstyle

In this post, we will show how to configure Jenkins for automated code checking as part of the Post-Build Tests:

After this tutorial has been followed, we will have learned how to apply standard or custom checks on the code quality using Checkstyle in Eclipse and Jenkins.

Tools & Versions used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.32.1
        • Checkstyle Plug-in 3.47
      • Eclipse Kepler Service Release 2 (Build id: 20140224-0627)
        • Checkstyle Plug-in 7.2.0.201611082205

Prerequisites:

      • Free DRAM for the a Docker Host VM >~ 4 GB
      • Docker Host is available, Jenkins is installed and a build process is configured. For that, perform all steps in part 1 to part 3 of this blog series (new: you now can skip part 1, if you wish)
      • Tested with 2 vCPU (1 vCPU might work as well)

Step 1: Start Jenkins in interactive Terminal Mode

Make sure that port 8080 is unused on the Docker host. If you were following all steps in part 1 of the series, you might need to stop cadvisor:

(dockerhost)$ sudo docker stop cadvisor

I assume that jenkins_home is already created, all popular plugins are installed and an Admin user has been created as shown in part 1 of the blog series. We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /vagrant/jenkins_home/
(dockerhost:jenkins_home)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 2: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:

We have created this entry in part 1 already, but I have seen that the entries were gone again, which seems to be a VirtualBox bug. I have added it again now.

Log in with the admin account we have created in the last session:

Step 3: Code Analysis: Checkstyle

With Gradle, we can invoke the Checkstyle plugin as follows:

Step 3.1: Prepare Gradle for performing Checkstyle

Add to build.gradle:

apply plugin: 'checkstyle'

tasks.withType(Checkstyle) {
 ignoreFailures = true
 reports {
 html.enabled = true
 }
}

We have set ignoreFailures to true, since we do not want the Gradle build to fail for now. We are just interested in the Checkstyle reports for now.

We can download an example Checkstyle configuration file from the Apache Camel repository, for example:

git clone <yourprojectURL>
mkdir -p <yourprojectDir>/config/checkstyle/
curl https://raw.githubusercontent.com/apache/camel/master/buildingtools/src/main/resources/camel-checkstyle.xml > <yourprojectDir>/config/checkstyle/checkstyle.xml

Step 3.2 (optional): Test Checkstyle locally

If you have no Git and/or no Gradle installed, you may want to skip this step and directly proceed to the next step, so Jenkins is performing this task for you.

We can locally invoke CheckStyle as follows:

gradle check

Step 3.3: Configure Jenkins to invoke Checkstyle

Adding Gradle Checkstyle tests to be performed before each build is as simple as performing Step 4.1 and then adding “check” as a goal to the list of Jenkins Build Gradle Tasks:

On Dashboard -> Click on Project name -> Configure -> Build, add “check” before the jar task:

Click .

Now we verify the setting by either checking changed code into the SW repository (now is a good time to commit and push the changes performed in Step 4.1) or by clicking “Build now” -> Click on Build Link in Build History -> Console Output in the Project home:

We have received a very long list of CheckStyle Errors, but, as configured, the build does not fail nevertheless.

At the same time, CheckStyle Reports should be available on Jenkins now:

The Links specified in the output are only available on Jenkins, but since Jenkins is running as a Docker container on Vagrant VM residing in

D:\veits\Vagrant\ubuntu-trusty64-docker_openshift-installer\jenkins_home

I need to access the files on

file:///D:/veits/Vagrant/ubuntu-trusty64-docker_openshift-installer/jenkins_home/workspace/GitHub%20Triggered%20Build/build/reports/checkstyle/

And on main.html we find:

Wow, it seems like I really need to clean the code…

Step 4: Visualize the CheckStyle Warnings and Errors to the Developer

Usually Jenkins is not running as a Docker container on the Developer’s PC or Notebook, so he has no access to the above report files. We need to publish the statistics via the Jenkins portal. For that, we need to install the CheckStyle Jenkins plugin:

Step 4.1 (optional): Install the “Static Analysis Utilities”

Note: I have not tried it out, but I believe that this step is not necessary, since the next step will automatically install all plugins the Checksytle plug-in depends on.

On Jenkins -> Manage Jenkins -> Manage Plugins -> Available

In the Filter field, type “Static Analysis U”

Check the checkbox of “Static Analysis Utilities” and Install without restart.

Step 4.2: Install Checkstyle Plugin

On Jenkins -> Manage Jenkins -> Manage Plugins -> Available

In the Filter field, type “Checkstyle ” (with white space at the end; this will limit the number of hits):

Check the checkbox of “Checkstyle Plug-in” and Install without restart.

Step 4.3: Configure Checkstyle Reporting in Jenkins

On Dashboard -> <your Project> -> Configure -> Post-build Actions -> , choose

Now add the path, where Gradle is placing its result xml files:

**/build/reports/checkstyle/*.xml

And click .

Step 4.4: Manually trigger a new Build

On the Project page, click “Build now”, then click on the build and then “Console output”:

We now can see [CHECKSTYLE] messages after the build, telling us, that the reports were collected. Now, where can we see them?

Step 4.5: Review Checkstyle Statistics

On the Project page, choose Status:

and click on Checkstyle Warnings on the left, or the warnings link in the center of the page, and we get a graphical representation of the Checkstyle statistics:

When clicking on one of the File Links (MyRouteBuilder.java in this case), we can get an overview of the Warning types for this file:

We choose the category Indentation and get details on the warnings:

and after clicking on one of the links in the Warnings field, we see the java code causing the warning:

Okay, Camel’s Checkstyle configuration does not like my style of grouping each route’s first line with a smaller indent than the rest of the route:

And it does not seem to accept my style of putting the ; in a new line at the end of a route as seen by choosing the Whitespace category and then choosing an occurence:

I either need to change this style, or I need to adapte the checkstyle.xml configuration file to ignore those warnings.

Step 5: Improve Code Style

For the developer, it is very inconvenient to use the Jenkins Checkstyle messages from the console and match them with the code. We need something better than that: the Eclipse Checkstyle plugin.

Step 5.1: Install Eclipse Checkstyle Plugin via local Installation

Since the recommended installation via Marketplace did not work in my case (see Appendix A), I have followed some hints about a local installation found on StackOverflow:

Download Checkstyle from Sourceforge.

In the next window, you are asked to specify some credentials we do not have. However, you can just ignore the window and click Cancel:

->Cancel

Then the installation proceeds:

Now I had to klick OK on security warnings twice:

At the end, I had to restart Eclipse:

Now, the Checkstyle plugin is installed on Eclipse.

Step 5.2: Configure Project for Checkstyle Usage

The project in question must be enabled for Checkstyle usage by editing the Project Properties:

Choosing the Checkstyle style. For now, let us choose the Google Checks in the drop-down list:

Then confirm that the project is being re-built:

Now the code is more yellow than white, with many hints how to improve the code:

However, the hints do not go away, if you correct the code. Do we need to rebuild again? Let us test:

Google style does not like that there is no empty line before the package line (sorry, in German):

So, let us add an empty line and save the file. However, the style warning does not change:

Let us rebuild the project:

Yes, after the re-build: the warning has disappeared:

Step 5.3: Download and Create Custom Checkstyle Profile in Eclipse

In the Jenkins Checkstyle tests above, we have used following custom Checkstyle configuration file:

$ curl https://raw.githubusercontent.com/apache/camel/master/buildingtools/src/main/resources/camel-checkstyle.xml > <yourprojectDir>/config/checkstyle/checkstyle.xml

I.e. the Checkstyle file is found on <yourprojectDir>/config/checkstyle/checkstyle.xml

Correct:

Step 5.4: Assign Custom Checkstyle Profile to the Project

To assign the new Checkstyle profile to the project, we change the project’s Checkstyle properties by

Project->Properties -> Checkstyle

-> Choose new Checkstyle profile -> OK

On the Rebuild suggested window -> Yes

This works fine:

In the code, we can see the Checkstyle warnings. To get more information on the specific Checkstyle warning, the warning text can be retrieved via the mouse over function on the left of the code line, or on the markers tab on the lower pane of Eclipse.

Step 5.5: Improve Code Style

Step 5.5.1: Change Code

In order to test, how the developer can improve the code style, let us replace some of the tabs by four spaces here:

Save the file now.

Step 5.5.2: Update Maven

Unfortunately, the Checkstyle warnings update process is a little cumbersome for custom Checkstyle profiles, it seems: we need to

  1. save the changed file,
  2. update Maven and
  3. rebuild the project.

Let us update Maven first:

right-click the project folder in the left pane -> Maven -> Update Project -> OK

Then all Checkstyle markers are gone (although I have not changed all occurrences of a tab):

Step 5.5.3 Rebuild the Project

To get the Checkstyle warnings back, we need to rebuild the project:

Project -> Build Project

Now we can see that some of the Checkstyle warnings are gone:

Next time, you check in the code to the Gir repository, you will see that the number of Checkstyle warnings we get from Jenkins via Gradle will decrease…

Step 6: Verify Jenkins Results

Since we have improved the source code, we expect the Jenkins Checkstyle warnings to decrease. We can verify this by doing the following:

-> save, commit and push the improved code -> log into Jenkins -> check out the build process that is triggered by the code push (or we can manually trigger the build process by clicking project -> Build now)

On the dashboard, we will see, that the Checkstyle statistics have (very) slightly improved:

On the upper right edge of the figure, the number of warnings is slightly lower. The code quality is far from being perfect, but we now have all tools and plugins needed to improve the situation.

After changing all tabs by 4 spaces each, the number of Checkstyle violations goes down by ~50%. That is a good start.

Perfect, we have learned how to use the Checkstyle plugin for Eclipse in order to produce better code. And the Jenkins Checkstyle plugin allows us to admire the progress we make.

😉

Appendix A: Problems with installing Checkstyle Eclipse Plugin via Marketplace

Note: this way of installation is recommended officially, but has failed in my case. If you hit the same problem, try the local installation as described in step 5.1 above.

To improve the style, it would be much too cumbersome to click through all 360 style warnings, edit the Java code, build the Code and check again. It is much better to give the programmer immediate feedback of the warning within the IDE. I am using Eclipse, so I need to install the Checkstyle Eclipse plugin as follows:

Choose Eclipse -> Help -> Eclipse Marketplace

Search for “Checkstyle” and click install:

And then “confirm”:

What is that?

I install it anyway. At this point, it is hanging quite a while:

so, let me get a morning coffee…

After approximately two minutes, I can see it to proceed to 4 / 15. Good sign.

After the coffee, I still see 4 / 15. Not a good sign:

Meanwhile I am researching the steps needed for performance testing…

After 2 hours or so: 6/15

This will take all day!

Some hours later, I checked again, and I have seen the following:

I have confirmed, confirmed the license:

And have pressed Finish.

Then software gets installed:

I hope I will not break my good old Eclipse installation (it is installed locally, not in a virtual machine or container and it has ever worked better than any new version I have tested…).

After a two or three minutes:

I have confirmed with “OK”…

Then I had been asked to restart Eclipse and I have confirmed.

Problem: however, Checkstyle is still not installed:

Help -> Eclipse Marketplace

Let us try again by clicking “Install”:

This does not work

Workaround

Instead of installing Checkstyle via the Eclipse marketplace, better install the Eclipse Checkstyle Plugin via download (see Step 5.1)

Summary

In this blog post we have performed following tasks:

  1. Started Jenkins in a Docker container
  2. Installed the Checkstyle Gradle Plugin to create Checkstyle plugins as XML files
  3. Installed the Checkstyle Jenkins Plugin to summarize the XML files into graphical historic reports
  4. Installed the Checkstyle Eclipse Plugin to be able to improve the code
  5. Installed custom Checkstyle policies
  6. Visualized the Code Improvement
  7. were happy

All steps apart from the installation of the Eclipse Checkstyle plugin were quite straightforward. For the Eclipse Checkstyle installation, we had to revert back to a local download and installation method described in step 5.1: the installation via Eclipse marketplace had failed. At the end, we could reduce the number of Checkstyle warnings by 50% without much effort.

Further Reading

2

Jenkins Part 4.1: Functional Java Tests via JUnit

You also think that functional tests are one of the most important ingredients for delivering high quality software? You share my opinion that we should help the developer automating this task in order to get comparable results and to receive meaningful trend reports?

I will cover functional tests here. Instructions on how to perform code quality tests and performance tests are in draft status and will be covered in the next two blog posts.

Any questions and/or comments are highly welcome.

Introduction

As a developer you try hard to deliver high quality software.

You hate searching for this nasty bug that had been introduced unnoticed days ago. Or was it weeks ago? By whom? In which code?

Manual functional and performance testing after each commited code change quickly becomes a NO-GO as the number of features is rising constantly. In this blog post, we will show, how Jenkins can help you with both: delivering high quality software and minimizing the time needed to find the cause of a bug.

How about …

  1. creating automated tests for each functionality and performance at different levels (end to end, and unit tests)
  2. running the automated tests after each code change
  3. keeping track of the test results

… in order to avoid any bad surprises late in the game?

Okay: for 1., the developer needs to create automated functional and perfomance tests; I guess, there is no way around this. Better do this even before writing the actual code. For 2. and 3., however, automation tools like Jenkins step in and can be of great help. The developers checks in the code and Jenkins can do the rest for you.

In the current  blog post, we will show how to integrate automated JUnit functional tests into a Jenkins build pipeline. We will see that JUnit tests can be invoked easily via Gradle (Okay, Maven is more popular than Gradle, I guess, but I like Gradle because of some advantages I have discussed here; However, just give me a hint in a comment to this blog and I will prioritize the creation of a Maven version of this blog post). The Jenkins JUnit plug-in will be used to

  1. display reports on single build runs as well as
  2. display trend analysis graphs like the following one I have borrowed from here:
Source: http://nelsonwells.net/2012/09/how-jenkins-ci-parses-and-displays-junit-output/

In this and the next two blog posts, we plan to cover following quality gate measures:

  • Part 4.1: Functional Tests (this blog post): we will use Java JUnit tests performed before building the executable JAR. Jenkins will report the test trend
  • Part 4.2: Code Quality Tests (coming soon): we will use the Checkstyle Gradle plugin for reporting to which degree the code adheres to the Apache Foundations formal rules
  • Part 4.3: Performance Tests (planned): we will use JMeter for testing and reporting the performance trend performed after the Java build using external performance testers like JMeter

Older blogs of this series:

This blog post series about Jenkins build pipelines is divided into following parts:

    • Part 1: Installation and Configuration of Jenkins, loading Plugins
    • Part 2: Creating our first Jenkins job: GitHub download and Software build
    • Part 3: Periodic and automatically triggered Builds

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

A typical workflow is visualized above: a developer checks in the code changes into the repository. Jenkins will detect the change, build (compile) the software, test it and prepare to deploy it on a system. Depending on the configuration, the deployment is triggered by a human person, or automatically performed by Jenkins. After each step, the developer is informed depending on the priorites defined.

For more information, see the introduction found in part 1 of this blog series.

Automated Functional Testing based on JUnit

In this blog post, we will show how we need to configure Gradle and Jenkins for automated JUnit testing and reporting. In order to build a quality gate, we will reverse the original order and perform the JUnit tests before we build the executable JAR file (we do not want to create JAR files that are not functional):

Tools used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.19.3
        • JUnit Plug-in 1.19

Prerequisites:

      • Free DRAM for the a Docker Host VM >~ 4 GB
      • Docker Host is available, Jenkins is installed and a build process is configured. For that, perform all steps in part 1 to part 3 of this blog series
      • Tested with 2 vCPU (1 vCPU might work as well)

Step 1: Start Jenkins in interactive Terminal Mode

Make sure that port 8080 is unused on the Docker host. If you were following all steps in part 1 of the series, you might need to stop cadvisor:

(dockerhost)$ sudo docker stop cadvisor

I assume that jenkins_home is already created, all popular plugins are installed and an Admin user has been created as shown in part 1 of the blog series. We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /vagrant/jenkins_home/
(dockerhost:jenkins_home)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 2: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:

We have created this entry in part 1 already, but I have seen that the entries were gone again, which seems to be a VirtualBox bug. I have added it again now.

Log in with the admin account we have created in the last session:

Step 3: Pre-Build JUnit Tests invoked by Gradle

In this step, we will invoke Gradle Tests before building the JAR. For that, we should verify locally that the Gradle tests are successful and then define a test Gradle task in the build process.

Step 3.1 (optional): Verify that Gradle Tests are successful

You can skip this test and directly let Jenkins do this for you. This may come handy, if you have not installed Git and/or Gradle locally.

Prerequisites

  • Your Java project has successful JUnit tests defined
  • Git is installed
  • The Project is cloned to a local directory
  • Gradle is installed

In order to test, whether the JUnit tests are successful, we can test those on a system with the project cloned (git, java and gradle must be installed):

(basesystem)$ gradle test
Starting a Gradle Daemon (subsequent builds will be faster)
Parallel execution is an incubating feature.
:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:compileTestJava
warning: [options] bootstrap class path not set in conjunction with -source 1.6
1 warning
:processTestResources
:testClasses
:test

BUILD SUCCESSFUL

Total time: 29.9 secs

With that, we have verified that the command “gradle test”succeeds.

Note that the JUnit test must be designed in a way that they are independent of whether or not the JAR file is run in parallel. No simple way of running the executable JAR file in parallel to the execution of the JUnit tests seems to exist. In my case, I had to alter the JUnit tests to fulfill this prerequisite.

Step 3.2: Add Gradle test Task to Jenkins

As long as JUnit tests are defined in src/test of the project, adding Gradle tests to Jenkiny is as simple as adding “test” as a task to the list of Jenkins Build Gradle Tasks as follows:

On Dashboard -> Click on Project name -> Configure -> Build, add “task ” before the jar task:

Click .

If you have made local code changes on the project, now is the best time to commit and push them to the Git repository. If you have followed the steps in part 3, then this will automatically trigger a build process, so you do not need to click on “Build now” in that case. Otherwise, click on “Build now” on the Jenkins project page (e.g. Dashboard -> click on project name -> “Build now”).

Now we observe the result by clicking on the build process, then -> “Console Output”:

Don’t be confused by the blinking red ball on the upper left of the Console Output page: we see a BUILD SUCCESSFUL message and if we re-enter the same page, the ball is turned to static blue, indicating a successful build.

Step 4: Add JUnit Test Result Reporting to Jenkins

Now we will show how to add the JUnit test reports to the Jenkins build process.

Step 4.1: Install Jenkins JUnit Plugin

For Jenkins JUnit reporting, we need to install the JUnit Plug-in. For that, goto -> Jenkins Dashboard -> Manage Jenkins -> Manage Plugins -> Available -> Enter “JUnit Plugin” to the Find field -> Install

Note: If you do not find the plugin on the Available tab, search for it in the “Installed” tab.

You can install the plugin without reloading Jenkins.

Step 4.2: Configure Jenkins to collect and display the JUnit Test Results

In this step, we will configure Jenkins, so it will display the test results for individual builds as well as trend reporting. For that, navigate to:

Jenkins -> (choose Project) -> Configure -> Post-build Actions -> Publish JUnit test results report

Add

**/build/test-results/test/TEST-*.xml

to the “Test report XMLs” field, since this is the path, where Gradle is placing its JUnit test result reports (I have found the info here).

Now click .

Step 4.3: Verify JUnit individual Test Reporting

To test the Jenkins JUnit reporting feature, we trigger a clean build by adding “clean” to the Gradle tasks on Project -> Configure -> Build:

and clicking .

Then trigger a new build by clicking on Project -> Build now.

Then click on the Build Process, and then on Console output:

…scrolling down…

Do not be confused that the build process never seems to finish. Just click the Back to Project link:

On the Status page, we see that there were no failed tests:

When we click on the Tests Result link on the left (or on the lower middle part on the Status page), we will see more details:

We can see that we have had four tests (Create/Read/Update/Delete a file) and 100% of them were successful.

Step 4.3: Verify JUnit Test Trend Reporting

On the project’s Status page, a Test Trend graph is automatically added, as soon as there are two or more tests available. For that, click on “Build Now” on the left for a second time and click ENABLE AUTO REFRESH on the upper right. After the second build is complete, the (hopefully) blue Test Result Trend graph is showing up on the project status page:

The new blue graph shows that we had 4 successful tests in the last two builds.

Note: disregard the red Checkstyle Trend graph for now. This is something we will cover in the next blog post.

Step 5: Verify failed Test Reporting

Per default, Gradle build will fail, if one of the JUnit tests has failed, so it is building a strict quality gate. Will the test result be collected and reported nevertheless?

Let us test this now by breaking one of the JUnit tests by purpose. We have added an assert message that is expected to fail in one of the tests:

Now we commit and push the change to the SW repository:

$ git clone <Repository-URL>
$ cd <Repository Dir>
<perform the code changes here...>
$ git diff src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
diff --git a/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java b/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
index 684d30f..10200d5 100644
--- a/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
+++ b/src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
@@ -115,6 +115,9 @@ public class SimpleRestfulFileStorageTests extends CamelSpringTestSupport {
 // mock expectations need to be specified before sending the message:
 mock.expectedBodiesReceived("File ttt created: href=http://localhost:2005/files/ttt");
 mock.expectedMessageCount(1);
+ ^M
+ // In order to break this test for Jenkins test reporting, we temporarily add a requirement that will fail:^M
+ mock.expectedMessageCount(2);^M

 template.sendBodyAndHeaders("direct:recipientList", body, headers);

$ git add src/test/java/de/oveits/simplerestfulfilestorage/SimpleRestfulFileStorageTests.java
$ git commit -m "Breaking a JUnit test by purpose for Jenkins reporting tests"
[jenkinstest 33655b9] Breaking a JUnit test by purpose for Jenkins reporting tests
 1 file changed, 4 insertions(+), 1 deletion(-)

olive@LAPTOP-P5GHOHB7 /d/veits/eclipseWorkspaceRecent/simple-restful-file-storage (jenkinstest)
$ git push
Counting objects: 9, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (9/9), 744 bytes | 0 bytes/s, done.
Total 9 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
To https://github.com/oveits/simple-restful-file-storage.git
 edb49f7..33655b9 jenkinstest -> jenkinstest

This will automatically trigger a new build (if you have followed part 3 of this series; otherwise just press “Build Now” on Jenkin’s project page).

We can see on the dashboard, that the build has failed:

This was expected. Now let us click on the Project name, and we will see, what happened:

Perfect, that is exactly, what I wanted to achieve: On the Test Result Trend, we can see that we have performed 4 tests, one of which has failed.

Let us fix the failed test by commenting out (or removing) the wrong code again:

After

$ git add <file>
$ git commit -m "Fixed JUnit test again to test Jenkins JUnit trend report"
$ git push

The next build should be successful again and we can see in the trends graph that the failed test is fixed again:

Summary

In this blog post, we have shown

  1. How to add Java functional tests to the Jenkins build pipeline based on Gradle JUnit Plugin
  2. How to install the JUnit plug-in to Jenkins for report collection
  3. How to display JUnit test results for individual builds on the Jenkins portal
  4. How to display JUnit trend analysis on the Jenkins portal

The only challenge I have encountered is, that I had to re-write my JUnit tests in a way that they were successful when run stand-alone. Before they were successful only, if the executable JAR file was started manually before running the JUnit tests. This was resolved in a way specific to the framework used (Apache Camel in this case).

Coming Soon: Code Analysis Trend Analysis via Jenkins Checkstyle plugin

Further Reading

7

Jenkins Part 3.1: periodic vs triggered Builds

Today, we will make sure that Jenkins will detect a code change in the software repository without manual intervention. We will show two methods to do so:

  1. Periodic Builds via Schedulers: Jenkins periodically asks the software repository for any code changes
  2. Triggered Builds via Webhooks: Jenkins is triggered by the software repository to perform the build task

We will see that the triggering build processes is more challenging to set up, but has quite some advantages in terms of economics and handling, once it is set up properly. See also the Summary at the end of this post.

This blog post series is divided into following parts:

    • Part 1: Installation and Configuration of Jenkins, loading Plugins
    • Part 2: Creating our first Jenkins job: GitHub download and Software build
    • Part 3 (this blog): Periodic and automatically triggered Builds
    • Part 4 (planned): running automated tests

What is Jenkins?

Jenkins is the leading open source automation server mostly used in continuous integration and continuous deployment pipelines. Jenkins provides hundreds of plugins to support building, deploying and automating any project.

 

Jenkins build, test and deployment pipeline

A typical workflow is visualized above: a developer checks in the code changes into the repository. Jenkins will detect the change, build (compile) the software, test it and prepare to deploy it on a system. Depending on the configuration, the deployment is triggered by a human person, or automatically performed by Jenkins.

For more information, see the introduction found in part 1 of this blog series.

Automatic Jenkins Workflow: Periodic Polling

In this chapter, we will show how we need to configure Jenkins for automatic polling of the Software repository and start the build process, if code changes are detected.

Tools used

      • Vagrant 1.8.6
      • Virtualbox 5.0.20
      • Docker 1.12.1
      • Jenkins 2.19.3

Prerequisites:

      • Free DRAM for the a Docker Host VM >~ 4 GB
      • Docker Host is available, Jenkins is installed and a build process is configured. For that, perform all steps in part 1 and part 2 of this blog series
      • Tested with 2 vCPU (1 vCPU might work as well)

Step 1: Start Jenkins in interactive Terminal Mode

Make sure that port 8080 is unused on the Docker host. If you were following all steps in part 1 of the series, you might need to stop cadvisor:

(dockerhost)$ sudo docker stop cadvisor

I assume that jenkins_home is already created, all popular plugins are installed and an Admin user has been created as shown in part 1 of the blog series. We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /vagrant/jenkins_home/
(dockerhost:jenkins_home)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
...
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 2: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

Note: In case of Vagrant with VirtualBox, per default, there is only a NAT-based interface and you need to create port-forwarding for any port you want to reach from outside (also the local machine you are working on is to be considered as outside). In this case, we need to add an entry in the port forwarding list of VirtualBox:

We have created this entry in part 1 already, but I have seen that the entries were gone again, which seems to be a VirtualBox bug. I have added it again now.

Log in with the admin account we have created in the last session:

Step 3: Configure Project for periodic Polling of SW Repository

Step 3.1: Goto Build Trigger Configuration

On the Jenkins Dashboard, find the hidden triangle right of the project name,

In the drop-down list, choose “Configure”

(also possible: on the Dashboard, click on the project name and then “Configure”).

Step 3.2: Configure a Schedule

We scroll down to “Build Triggers” and check “Build periodically” and specify that it will be done every 10 minutes (H/10 * * * *). I do not recommend to use lower values than that since I have seen that even my monster notebook with i7-6700HQ and 64GB RAM is quite a bit stressed by the build those many build processes.

Note that this is a very short polling period for our test purposes only; we do not want to wait very long after a code change is detected.

Note also: you can click the right of the Schedule text box to get help with the scheduler syntax.

Step 3.2: Save

Click Save

Step 4: Change the content of the Software Repository

Now we expect that a change of the SW repository is detected latest 2 minutes after new code is checked in. Let us do so now: In this case, I have changed the content of README.md and commited the change:

(local repository)$ git add README.md
(local repository)$ git commit -m "changed README"
(local repository)$ git push

Within 2 minutes, I see a new job #24 running on the lower left:

It seems that the page needs to be reloaded by refreshing the browser, so the dashboard displays the #24 build process as “Last Success”:

The build process was very quick, since we have not changed any relevant source code. The console log can be reached via the Jenkins -> Project Link -> Build History -> click on build number -> Console:

As you can see, after some hours, the git repository is downloaded even if there was no code change at all. However, Gradle will detect that the JAR file is up-to-date and it will not re-build the JAR file, unless there is a code change.

The disadvantage of a scheduled build process with high frequency is that the number of builds in the build history is increasing quickly:

Note: The build history is spammed by many successful builds with no code change, and it is not easy to find the interesting build among all those many unnecessary builds. Let us try to improve the situation by replacing periodic, scheduled builds by triggered builds:

Step 5: Triggered Builds

In Step 4, we have seen that periodic builds should not be performed in a very short timeframe, because:

  1. the Jenkins server is stressed quite a bit by configuring a too low build frequency
  2. the build history is polluted by information of many irrelevant build processes with no changed code.

Therefore, it is much better to create a triggered build. The target is to trigger a build process every time the developer is checking in new code to the software repository:

In this way, a periodic build is not necessary, or can be done much less frequently.

What do we need to do?

  1. Make sure that the Jenkins server is reachable from the SW repository
  2. Configure the SW repository with a web hook for informing Jenkins upon each code change
  3. Configure Jenkins for triggered build

Let us start:

Step 5.1 Configure Jenkins for triggered Build

On the Jenkins Dashboard, click on the project:

and then “Configure” on the left pane:

Scroll down to Build Triggers and check the “Trigger build remotels (e..g. , from scripts)” checkbox and choose an individual secret token (do not use the one you see here):

You will be provided with the build trigger URL, which is in my case:

JENKINS_URL/job/GitHub%20Triggered%20Build/build?token=TOKEN_NAME

And the JENKINS_URL is the URL needed to be contacted by the Git Repository. Save the URL above for later use.

Now click .

Step 5.2 Test Trigger URL locally

Now we can test the trigger URL locally on the Docker Host as follows (as found on this StackOverflow Q&A):

We need to retrieve a so-called Jenkins-Crumb:

(dockerhost)$ CRUMB=$(curl -s 'http://admin:your_admin_password@localhost:8080/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)')
(dockerhost)$ echo $CRUMB
Jenkins-Crumb:CCCCCCCCCCCCCCCCCCCCCCCCCC

Please make a note of the returned Jenkins-Crumb, since we will need this value in the next step.

Then we can use the Jenkins-Crumb as header in the build trigger request:

(dockerhost)$ curl -H $CRUMB 'http://admin:your_admin_password@localhost:8080/job/GitHub%20Triggered%20Build/build?token=hdsghewriohwziowrhwsn'

This should trigger a new build on Jenkins:

By clicking on the build and then the “Console Output”, we see a successful build with no changed data:

Step 5.3 Make sure that the Jenkins Server is reachable from the SW repository

We are running the Jenkins server as a Docker container within a Vagrant VM as host. In step 2 we have made sure that the Docker container is reachable from the local network by exposing the Docker ports and by configuring port forwarding in VirtualBox. However, the Docker container is not yet reachable from the Git Repository, since the Router will block all requests, as long as no pot forwarding is configured on the router:

Let us fix that now:

In my case, the (sorry, German) input mask of the router looks like follows:

I am mapping outside port 8080 to the internal machine running the Docker Host VM.

Now, the routing should work. We will test this in the next step.

Step 5.4: Add Webhook to Git SW Repository

Now we need to add a Webhook to the Git repository. In my case, the repository is located at https://github.com/oveits/simple-restful-file-storage. On that page, goto

 -> ->  -> 

Then copy&paste the URL of Step 5.1 into the Payload URL with following changes:

  • Change JENKINS_URL by the IP address or DNS name your router is reachable from the Internet.
  • Choose a port that you intend to open for this service (e.g. 8080) in the next step.
  • Add admin:your_admin_password@ before the JENKINS_URL; use your own username and password here
  • append &Jenkins-Crumb=CCCCCCCCCCCCCCCC to the URL with the value of the Jenkins-Crumb we have retrieved in the previous step

Example with the items to change in red:

http://admin:your_admin_password@your_public_ip_or_name:8080/job/GitHub%20Triggered%20Build/build?token=TTTTTTTTTTTTTTTTT&Jenkins-Crumb=CCCCCCCCCCCCCCCCCCCC

 

For the other fields, keep the defaults and klick .

If everything works fine, we already should see a successful delivery of the trigger on the lower end of the Github page:

If it was not successful, you can see more details by clicking on the request:

Step 6: Test triggered Build upon Code Push

This is the final step of this tutorial: we now will test that a build is triggered each time a user pushes new code to the repository.

Step 6.1: Install Git locally

If Git is not installed locally, so do it now.

Step 6.2: Download the Project Repository

We now clone the project by issuing the command

$ git clone https://github.com/oveits/simple-restful-file-storage

Step 6.3: Change Code

You can perform a minor change the content of the README.md in order to test the triggered build.

Step 6.4: Push Code to the Repository

With the commands

$ git commit -am "Minor change of README.md to trigger a Jenkins build"
$ git push

We push the changed code to the SW repository.

If everything works correctly, we will immediately see, that Git has triggered Jenkins to perform a build by reloading the Jenkins Dashboard (32 sec ago, in this screenshot):

We can check the build by clicking on the Last Success build and then “Console Output”:

Gradle was clever enough to detect that no relevant code had been changed, so everything is still up to date.

With this procedure we have made sure that the Software repository will trigger a new build process on each and every code change. Moreover, the Jenkins server is not polluted with unnecessary builds anymore, since we have switched off periodic builds.

Summary

In this blog post we have performed following tasks:

  1. Started Jenkins in a Docker container
  2. Configured and tested periodic builds
  3. Configured and tested triggered builds
  4. Made sure that the Git Software repository is triggering such a build at every code change

As in the other parts of this series, we have run Jenkins interactively in a Docker container. See below a discussion of the advantages of periodic and triggered builds:

Periodic Builds vs Triggered Builds

When we compare periodic builds with triggered builds, we see following advantages/disadvantages:

Complexity of Setup: periodic builds are much easier to set up. They only need to be configured on Jenkins. Triggered builds requires setup steps on the Jenkins Server, the Software Repository and intermediate Firewalls, if the Jenkins Server is located in a private network.

Economics: Triggered builds are more economic in terms of Jenkins Server load. The build processes run only, when needed.

Handling: Triggered builds have important handling advantages compared to triggered builds: firstly, each and every code change can be tested helping the programmer to get near immediate feedback for every code change. Secondly, the build log is not polluted by hundreds of irrelevant builds.

In my opinion, a clear winner is: triggered builds. Those may be combined with periodic clean builds at certain milestones.

 

References

 

2

Getting Started with Mesos Resource Reservation & Marathon Watchdog – A “Hello World” Example

Today, we will introduce Apache Mesos, an open source distributed computing system with the target to allow applications to run on a computer cluster as if it was running on a single computer. On top of a Mesos cluster, we will run Mesosphere Marathon, an open source container orchestration platform. Similar to a watchdog, Marathon helps running and maintaining long-running applications. However, unlike a mere watchdog, Marathon runs the applications in containers, and it provides a modern Web Portal and a modern RESTful API.

With the help of Marathon, we will

  • run several instances of a simple “Hello World” script on the cluster (within and outside of Docker containers);
  • see, what happens, if an application dies unexpectedly;
  • see, what happens, if an application reservation request exceeds the available resources.

For simplicity and quick installation purposes, all components of the Mesos architecture will be run within Docker containers.

What is Mesos?

Mesos is an open source framework and provides a distributed computer system. Mesos provides applications (e.g. Hadoop, Spark, Kafka, Elasticsearch) with APIs for resource management and scheduling across entire datacenters and cloud environments.

 

The Mesos Agents advertise the available resources (CPU, DRAM, …) to the master, which will relay those offers to frameworks like Marathon, Hadoop and Jenkins and many more. The frameworks may reserve all or part of the offered resources and run the application on the Mesos agents (slave).

What is Marathon?

Mesosphere, the owner of Marathon calls Marathon “a production-grade container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.”

Among others, it offers:

  • active-standby redundancy for increased availability
  • Marathon is starting containers on the Mesos Agent. Both, Mesos containers (using cgroups) and Docker containers are supported.
  • It offers a powerful GUI and
  • a REST API for easier integration.

A more complete feature list can be found here.

Compared to other schedulers like Apache Aurora used by Twitter, Marathon seems to be much easier to handle. On the other side, Aurora offers elaborate prioritization and preemption features. Those may be important, if the same resources are shared between production and development: if a production workload does not find any resources on a Mesos slave, Aurora will kill of less important applications in order to free up resources.

A good comparison between Mesosphere Marathon and Apache Aurora can be found on this Hootsuite Development’s web page.

Target Configuration for this Blog Post

In this “Hello World” example, we will create a simple Mesos configuration with a Marathon framework, a single Zookeeper, a single master and a single agent (slave):

We will also show what happens, if we kill a Marathon app.

Versions & Tools used

Prerequisites:

  • >~4 GB RAM: after starting the a Mesos master, a Mesos agent (slave), a ZooKeeper and a Marathon Docker container, I have observed a DRAM usage of ~3.8 GB

Step 1: Install a Docker Host via Vagrant and Connect to the Host via SSH

If you are using an existing docker host, make sure that your host has enough free memory.

We will run the applications in Docker containers in order to allow for maximum interoperability. This way, we always can use the latest versions without the need to control the java version used.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but no worries: we will show an easy way here, that is much quicker than the one described in Docker’s official documentation:

Prerequisites of this step:

  • I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy. However, if you cannot get rid of your HTTP proxy, read this blog post.
  • Administration rights on you computer.

Steps to install a Docker Host VirtualBox VM:

Download and install Virtualbox (if the installation fails with error message “<to be completed> see Appendix A of this blog post: Virtualbox Installation Workaround)

1. Download and Install Vagrant (requires a reboot)

2. Download Vagrant Box containing an Ubuntu-based Docker Host and create a VirtualBox VM like follows:

basesystem# mkdir ubuntu-trusty64-docker ; cd ubuntu-trusty64-docker
basesystem# vagrant init williamyeh/ubuntu-trusty64-docker
basesystem# vagrant up
basesystem# vagrant ssh

Now you are logged into the Docker host and we are ready for the next step: to create the Docker images.

Note: I have experienced problems with the vi editor when running vagrant ssh in a Windows terminal. In case of Windows, consider to follow Appendix C of this blog post and to use putty instead.

Step 2 (recommended): Download Docker Images

This step is optional, since the download will be done automatically with each docker run command, if the image is not available on the Docker host. However, I recommend to download the images in advance, so if you run the applications, you can observe the logs and other feedback (syntax errors) immediately.

Step 2.1 Download Zookeeper

By looking at the Mesosphere Github documentation, it seems like a Zookeeper (Exhibitor) is needed always. The only tag available as of today is the tag 1.5.2. Let us download it first:

(dockerhost)$ sudo docker pull netflixoss/exhibitor:1.5.2
1.5.2: Pulling from netflixoss/exhibitor

a3ed95caeb02: Pull complete
831a6feb5ab2: Pull complete
b32559aac4de: Pull complete
5e99535a7b44: Pull complete
aa076096cff1: Pull complete
423664404a49: Pull complete
929c1efe4d14: Pull complete
387bf8857f2e: Pull complete
5efe9ea3de0d: Pull complete
a53f74fd9d17: Pull complete
78b42a885be7: Pull complete
684d8691844e: Pull complete
Digest: sha256:9b384a431d2e231f0bd3fcda5eff20d5eabd5ba1e3361764a4834d3401fbc4d4
Status: Downloaded newer image for netflixoss/exhibitor:1.5.2

Step 2.2 Download Mesos Master

It is quite confusing, how many distributions of Mesos exist. The Docker Hub distribution from Mesoscloud seems to offer most of the stars:

$ sudo docker search mesos
NAME                                DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
mesoscloud/mesos-master             Mesos Master                                    50                   [OK]
mesoscloud/mesos-slave              Mesos Slave                                     31                   [OK]
...

However, if I search for mesos-master, I find another image with even more stars and downloads:

$ sudo docker search mesos-master
NAME                                         DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
mesosphere/mesos-master                      Mesos-Master in Docker                          71
mesoscloud/mesos-master                      Mesos Master                                    50                   [OK]
...

Mesosphere also offers an two-in-one solution with master and slave combined in a single container: mesosphere/mesos. However, the recommended way of running Mesos is to run a master on one Docker host and slaves on other Docker hosts.

Let us download the files mesosphere/mesos-master and mesosphere/mesos-slave. Note that the “latest” tag does not exist, so we need to specify the tag explicitly. Let us try with the latest version I have found to exist for both, master and slave: i.e. 1.1.01.1.0-2.0.107.ubuntu1404.

(dockerhost)$ docker pull mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404
1.1.01.1.0-2.0.107.ubuntu1404: Pulling from mesosphere/mesos-master

bf5d46315322: Already exists
9f13e0ac480c: Already exists
e8988b5b3097: Already exists
40af181810e7: Already exists
e6f7c7e5c03e: Already exists
a3ed95caeb02: Already exists
01a862c74d96: Already exists
651b06ceb77e: Already exists
Digest: sha256:a011e002d641c6ba8361c542bd9429af721b7c7434598a9615cbd5b05511af7f
Status: Downloaded newer image for mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404

The version of the downloaded Mesos Master image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm --name mesos-master mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404 --version
mesos 1.1.0

We are using version 1.1.0, currently.

Step 2.3 Download Mesos Slave

(dockerhost)$ docker pull mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404
1.1.01.1.0-2.0.107.ubuntu1404: Pulling from mesosphere/mesos-slave

bf5d46315322: Already exists
9f13e0ac480c: Already exists
e8988b5b3097: Already exists
40af181810e7: Already exists
e6f7c7e5c03e: Already exists
a3ed95caeb02: Already exists
01a862c74d96: Already exists
651b06ceb77e: Already exists
Digest: sha256:bb75cc78c6880a2faa5307e3d8caa806105c673e9002429e60e3ae858d162999
Status: Downloaded newer image for mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404

Step 2.4 Download Marathon

While Mesos will offer computing resources, Marathon is a framework that will ask for those resources.

(dockerhost)$ sudo docker pull mesosphere/marathon
Using default tag: latest
latest: Pulling from mesosphere/marathon

43c265008fae: Pull complete
af36d2c7a148: Pull complete
143e9d501644: Pull complete
bfc4cdbc8d81: Pull complete
38c6fc3e9968: Pull complete
0bfa8d5153bb: Pull complete
05bc8d0fffca: Pull complete
f1266a2a7ecb: Pull complete
f505e7ed4b7e: Pull complete
219f8c7fc022: Pull complete
Digest: sha256:9c881ff6f46a0da69f622a19a1677f1424a12ef37d076ec439854f15b97179fa
Status: Downloaded newer image for mesosphere/marathon:latest

Marathon does not offer a –version option or alike. The Marathon version can only be seen in the log, when running Marathon:

(dockerhost)$ sudo docker run -it --net=host -v `pwd`:/work_dir --entrypoint=bash mesosphere/marathon
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-12 21:01:46,268] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)

Step 3: Run Mesos

In this step, we will run Mesos Master interactively (with -it switch instead of -d switch) to better see, what is happening. In a productive environment, you will use the detached mode -d instead of the interactive terminal mode -it.

We have found out by analyzing the Dockerfiles of mesos-master and mesos-slave images that both are based on mesosphere/mesos with different entrypoints and commands:

  • mesosphere/mesos-master: entrypoint: mesos-master with default option --registry=in_memory
  • mesosphere/mesos-slave: entrypoint: mesos-slave with no default options

Interestingly, the Dockerfiles have no exposed ports specified. The reason is, that the Docker images are supposed to run in host network mode, and by that, they are sharing the interface including IP address(es) and Ports with the Docker host.

The usage of the docker images is documented on Github. It seems like a zookeeper is needed:

Step 3.1: Run Zookeeper (Exhibitor) interactively in a Container

In order to better see, what is happening, we will run the Zookeeper in interactive terminal (-it) mode instead of detached mode (-d), as described in the documentation. With the --nat=host option, we share the network with the Docker host, so we do not need to explicitly expose the used TCP ports:

(dockerhost)$ sudo docker run -it --net=host --name=zookeeper netflixoss/exhibitor:1.5.2
v1.0
INFO com.netflix.exhibitor.core.activity.ActivityLog Exhibitor started [main]
Dec 12, 2016 5:05:28 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
INFO org.mortbay.log Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog [main]
INFO org.mortbay.log jetty-1.0 [main]
Dec 12, 2016 5:05:29 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9.1 09/14/2011 02:36 PM'
INFO org.mortbay.log Started SocketConnector@0.0.0.0:8080 [main]
INFO com.netflix.exhibitor.core.activity.ActivityLog State: down [ActivityQueue-0]
Dec 12, 2016 5:05:30 PM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /root/.java/.userPrefs/prefs.xml
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog jps didn't find instance - assuming ZK is not running [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Starting in standalone mode [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /zookeeper/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /zookeeper/bin/../conf/zoo.cfg [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog State: serving [ActivityQueue-0]

In the log, we see, that the ZooKeeper is using port 8080. Therefore, we will see in Step 3.3 that the TCP port for Marathon needs to be changed in order to avoid a port resource collision.

Step 3.2: Run Master interactively in a Container

Here again, we are following the documentation of the images, but we are starting the Mesos master in interactive mode in a different terminal in order to see all logs, and we use the latest version 1.1.0 instead of 0.28.0:

(dockerhost)$ sudo docker run -it --net=host \
  --name mesos-master \
  -e MESOS_PORT=5050 \
  -e MESOS_ZK=zk://127.0.0.1:2181/mesos \
  -e MESOS_QUORUM=1 \
  -e MESOS_REGISTRY=in_memory \
  -e MESOS_LOG_DIR=/var/log/mesos \
  -e MESOS_WORK_DIR=/var/tmp/mesos \
  -v "$(pwd)/log/mesos:/var/log/mesos" \
  -v "$(pwd)/tmp/mesos:/var/tmp/mesos" \
  mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 17:10:44.898630 1 main.cpp:263] Build: 2016-11-16 01:30:23 by ubuntu
I1212 17:10:44.898900 1 main.cpp:264] Version: 1.1.0
I1212 17:10:44.898916 1 main.cpp:267] Git tag: 1.1.0
I1212 17:10:44.898927 1 main.cpp:271] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873
I1212 17:10:44.903816 1 logging.cpp:194] INFO level logging started!
I1212 17:10:44.904436 1 main.cpp:370] Using 'HierarchicalDRF' allocator
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-12 17:10:44,905:1(0x7ff4a8164700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-12 17:10:44,907:1(0x7ff4a6961700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-12 17:10:44,910:1(0x7ff4a6961700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff4b16c6200 sessionId=0 sessionPasswd=<null> context=0x7ff490000930 flags=0
I1212 17:10:44.909833 11 master.cpp:380] Master 917a95ab-7b77-4316-8e52-1431a8043af3 (openshift-installer-native-docker-compose) started on 127.0.0.1:5050
I1212 17:10:44.912077 11 master.cpp:382] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --port="5050" --quiet="false" --quorum="1" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/tmp/mesos" --zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
2016-12-12 17:10:44,909:1(0x7ff4a8164700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-12 17:10:44,913:1(0x7ff4a8164700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff4b16c6200 sessionId=0 sessionPasswd=<null> context=0x7ff4980038a0 flags=0
W1212 17:10:44.913079 11 master.cpp:385]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 17:10:44.914429 11 master.cpp:434] Master allowing unauthenticated frameworks to register
I1212 17:10:44.914448 11 master.cpp:448] Master allowing unauthenticated agents to register
I1212 17:10:44.914455 11 master.cpp:462] Master allowing HTTP frameworks to register without authentication
I1212 17:10:44.914474 11 master.cpp:504] Using default 'crammd5' authenticator
W1212 17:10:44.914487 11 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1212 17:10:44.914687 11 authenticator.cpp:519] Initializing server SASL
2016-12-12 17:10:44,922:1(0x7ff497fff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-12-12 17:10:44,923:1(0x7ff497fff700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0003, negotiated timeout=10000
2016-12-12 17:10:44,923:1(0x7ff4977fe700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-12-12 17:10:44,924:1(0x7ff4977fe700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0002, negotiated timeout=10000
I1212 17:10:44.924991 8 group.cpp:340] Group process (zookeeper-group(2)@127.0.0.1:5050) connected to ZooKeeper
I1212 17:10:44.925303 8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1212 17:10:44.925424 8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1212 17:10:44.925204 5 group.cpp:340] Group process (zookeeper-group(1)@127.0.0.1:5050) connected to ZooKeeper
I1212 17:10:44.925606 5 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1212 17:10:44.925617 5 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1212 17:10:44.931725 8 contender.cpp:152] Joining the ZK group
I1212 17:10:44.932301 11 master.cpp:1951] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1212 17:10:44.936324 9 contender.cpp:268] New candidate (id='1') has entered the contest for leadership
I1212 17:10:44.937194 5 detector.cpp:152] Detected a new leader: (id='1')
I1212 17:10:44.937408 5 group.cpp:697] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1212 17:10:44.939241 5 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1212 17:10:44.939414 5 master.cpp:2017] Elected as the leading master!
I1212 17:10:44.939437 5 master.cpp:1560] Recovering from registrar
I1212 17:10:44.941402 5 registrar.cpp:362] Successfully fetched the registry (0B) in 1.7152ms
I1212 17:10:44.941573 5 registrar.cpp:461] Applied 1 operations in 5462ns; attempting to update the registry
I1212 17:10:44.946907 5 registrar.cpp:506] Successfully updated the registry in 5.135104ms
I1212 17:10:44.947170 5 registrar.cpp:392] Successfully recovered registrar
I1212 17:10:44.947314 5 master.cpp:1676] Recovered 0 agents from the registry (184B); allowing 10mins for agents to re-register
2016-12-12 17:11:11,640:1(0x7ff497fff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 14ms
2016-12-12 17:11:11,641:1(0x7ff4977fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 15ms
2016-12-12 17:11:35,045:1(0x7ff497fff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 50ms
2016-12-12 17:11:35,046:1(0x7ff4977fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 51ms

Step 3.3: Run Slave interactively in a Container

Here again, we are following the documentation of the images, but we are starting the Mesos slave in interactive mode (-it) in a third terminal in order to see all log, and we use the latest version 1.1.0 instead of 0.28.0:

(dockerhost)$ sudo docker run -it --net=host --privileged \
  -e MESOS_PORT=5051 \
  -e MESOS_MASTER=zk://127.0.0.1:2181/mesos \
  -e MESOS_SWITCH_USER=0 \
  -e MESOS_CONTAINERIZERS=docker,mesos \
  -e MESOS_LOG_DIR=/var/log/mesos \
  -e MESOS_WORK_DIR=/var/tmp/mesos \
  -v "$(pwd)/log/mesos:/var/log/mesos" \
  -v "$(pwd)/tmp/mesos:/var/tmp/mesos" \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /cgroup:/cgroup \
  -v /sys:/sys \
  -v /usr/local/bin/docker:/usr/local/bin/docker \
  mesosphere/mesos-slave:1.1.01.1.0-2.0.107.ubuntu1404
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1212 17:18:39.260704 1 main.cpp:243] Build: 2016-11-16 01:30:23 by ubuntu
I1212 17:18:39.261031 1 main.cpp:244] Version: 1.1.0
I1212 17:18:39.261075 1 main.cpp:247] Git tag: 1.1.0
I1212 17:18:39.261108 1 main.cpp:251] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873
I1212 17:18:39.265000 1 logging.cpp:194] INFO level logging started!
I1212 17:18:39.400902 1 containerizer.cpp:200] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I1212 17:18:39.429229 1 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
2016-12-12 17:18:39,438:1(0x7ff8f7601700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
I1212 17:18:39.438886 1 slave.cpp:208] Mesos agent started on (1)@127.0.0.1:5051
2016-12-12 17:18:39,439:1(0x7ff8f7601700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
I1212 17:18:39.439324 1 slave.cpp:209] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://127.0.0.1:2181/mesos" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/tmp/mesos"
2016-12-12 17:18:39,441:1(0x7ff8f7601700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-12 17:18:39,442:1(0x7ff8f7601700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-12 17:18:39,442:1(0x7ff8f7601700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7ff9023a0200 sessionId=0 sessionPasswd=<null> context=0x7ff8d4001e00 flags=0
2016-12-12 17:18:39,445:1(0x7ff8f4dfc700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-12-12 17:18:39,446:1(0x7ff8f4dfc700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f400557d0004, negotiated timeout=10000
W1212 17:18:39.440688 1 slave.cpp:212]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 17:18:39.448009 8 group.cpp:340] Group process (zookeeper-group(1)@127.0.0.1:5051) connected to ZooKeeper
I1212 17:18:39.448274 8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1212 17:18:39.448418 8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1212 17:18:39.450460 1 slave.cpp:533] Agent resources: cpus(*):2; mem(*):2928; disk(*):1902607; ports(*):[31000-32000]
I1212 17:18:39.452038 8 detector.cpp:152] Detected a new leader: (id='1')
I1212 17:18:39.452385 8 group.cpp:697] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1212 17:18:39.452198 1 slave.cpp:541] Agent attributes: [ ]
I1212 17:18:39.454398 1 slave.cpp:546] Agent hostname: openshift-installer-native-docker-compose
I1212 17:18:39.459702 8 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1212 17:18:39.462762 9 state.cpp:57] Recovering state from '/var/tmp/mesos/meta'
I1212 17:18:39.464331 9 status_update_manager.cpp:203] Recovering status update manager
I1212 17:18:39.465109 9 docker.cpp:764] Recovering Docker containers
I1212 17:18:39.465148 11 containerizer.cpp:555] Recovering containerizer
I1212 17:18:39.472216 6 provisioner.cpp:253] Provisioner recovery complete
I1212 17:18:39.917292 8 slave.cpp:5281] Finished recovery
I1212 17:18:39.938516 8 slave.cpp:915] New master detected at master@127.0.0.1:5050
I1212 17:18:39.939077 8 slave.cpp:936] No credentials provided. Attempting to register without authentication
I1212 17:18:39.939728 8 slave.cpp:947] Detecting new master
I1212 17:18:39.938575 9 status_update_manager.cpp:177] Pausing sending status updates
E1212 17:18:40.622269 8 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I1212 17:18:40.631978 8 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:40.634527 8 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
E1212 17:18:40.632072 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E1212 17:18:42.052165 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I1212 17:18:42.052312 7 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:42.057737 7 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
I1212 17:18:45.093793 6 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:45.094339 6 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
E1212 17:18:45.093952 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I1212 17:18:48.748410 9 slave.cpp:4179] Got exited event for master@127.0.0.1:5050
W1212 17:18:48.748883 9 slave.cpp:4184] Master disconnected! Waiting for a new master to be elected
E1212 17:18:48.748569 13 process.cpp:2154] Failed to shutdown socket with fd 12: Transport endpoint is not connected
2016-12-12 17:18:49,470:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 12ms
I1212 17:18:50.030248 8 detector.cpp:152] Detected a new leader: None
I1212 17:18:50.030894 8 slave.cpp:908] Lost leading master
I1212 17:18:50.031404 8 slave.cpp:947] Detecting new master
I1212 17:18:50.031327 7 status_update_manager.cpp:177] Pausing sending status updates
2016-12-12 17:18:53,405:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 43ms
I1212 17:19:02.637687 7 slave.cpp:1324] Skipping registration because no master present
I1212 17:19:22.959288 8 detector.cpp:152] Detected a new leader: (id='2')
I1212 17:19:22.961132 8 group.cpp:697] Trying to get '/mesos/json.info_0000000002' in ZooKeeper
I1212 17:19:22.965332 8 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1212 17:19:22.971281 8 slave.cpp:915] New master detected at master@127.0.0.1:5050
I1212 17:19:22.972019 8 slave.cpp:936] No credentials provided. Attempting to register without authentication
I1212 17:19:22.975116 8 slave.cpp:947] Detecting new master
I1212 17:19:22.971410 9 status_update_manager.cpp:177] Pausing sending status updates
I1212 17:19:23.927111 11 slave.cpp:1115] Registered with master master@127.0.0.1:5050; given agent ID a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0
I1212 17:19:23.931748 12 status_update_manager.cpp:184] Resuming sending status updates
I1212 17:19:24.781162 11 slave.cpp:1175] Forwarding total oversubscribed resources {}
I1212 17:19:39.456789 10 slave.cpp:5044] Current disk usage 62.36%. Max allowed age: 1.934608332272824days
2016-12-12 17:19:46,345:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 20ms
2016-12-12 17:20:19,750:1(0x7ff8f4dfc700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 41ms

Step 4: Connect to Mesos Web Portal

Mesos’ web portal is running on port 5050 per default. In my case, I am running Mesos on Docker containers, which are hosted by a VirtualBox Docker host VM. Since the VM only has a NATed virtual Ethernet interface, I need to specify a port forwarding rule in VirtualBox, so I can reach port 5050 of the Docker host:

After that, we can access the Mesos Master Portal from a Browser on the base system:

Note: if you see the error “Failed to connect to 127.0.0.1:5050!”, try reloading (instead of retrying) the page via the browser reload function. See Appendix A.1 for details.

We can see the single Mesos slave (i.e. agent) connected to the master on the left:

The details of the agents (slaves) can be seen by clicking on the Agents link in the menu:

So, what is next? It looks like we need a framework like Marathon in order to start applications on Mesos. Let us start Marathon now.

Step 5: Start Marathon

Let us start Marathon in a container as described here. However, we will use port 7070 instead of 8080, because 8080 collides with a port that is used  by the ZooKeeper. In addition, we will change the master URI and let it point to the external Mesos master we have started in step 3. Moreover, we need to set the environment variable MESOS_WORK_DIR because of a Mesos bug. See Appendix B for details.

(dockerhost)$ sudo docker run -it --name marathon --rm --net=host -e MESOS_WORK_DIR=/var/lib/mesos --entrypoint=bash mesosphere/marathon
(container)# ./bin/start --master zk://127.0.0.1:2181/mesos --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-13 14:46:09,798] INFO Starting Marathon 1.3.6/unknown with --master zk://127.0.0.1:2181/mesos --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-13 14:46:10,148] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-13 14:46:10,336] INFO Logging initialized @1841ms (org.eclipse.jetty.util.log:main)
[2016-12-13 14:46:10,849] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 14:46:11,060] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-13 14:46:11,128] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-13 14:46:11,195] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,200] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,202] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,203] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-13 14:46:11,205] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-13 14:46:11,321] INFO All actors suspended:
* Actor[akka://marathon/user/groupManager#1830533307]
* Actor[akka://marathon/user/launchQueue#-485770292]
* Actor[akka://marathon/user/killOverdueStagedTasks#540935184]
* Actor[akka://marathon/user/offersWantedForReconciliation#-876030651]
* Actor[akka://marathon/user/taskKillServiceActor#1912899513]
* Actor[akka://marathon/user/rateLimiter#1490654495]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-1930600924]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-405721147]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1467955826]
* Actor[akka://marathon/user/taskTracker#623538008]
* Actor[akka://marathon/user/offerMatcherManager#1760835888] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-13 14:46:11,417] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-13 14:46:11,418] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-13 14:46:11,497] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 14:46:11,497] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 14:46:11,498] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-13 14:46:11,504] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,535] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:11,704] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,705] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,706] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,710] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,765] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:11,782] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,027] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,028] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,073] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,096] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.memory.free=110MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Client environment:os.memory.total=158MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,124] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@3411e275 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-13 14:46:12,194] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 14:46:12,201] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 14:46:12,212] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90014, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-13 14:46:12,217] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-13 14:46:12,279] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-13 14:46:12,284] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 14:46:12,302] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@5301f666 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-13 14:46:12,310] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 14:46:12,310] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 14:46:12,323] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90015, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-13 14:46:12,367] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,391] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,398] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,398] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,404] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,425] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,429] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,431] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,433] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,441] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,443] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,445] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,445] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,446] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,449] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-5)
[2016-12-13 14:46:12,451] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 14:46:12,468] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-13 14:46:12,469] INFO Started o.e.j.s.ServletContextHandler@65817241{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,484] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-13 14:46:12,487] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,489] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-13 14:46:12,492] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 14:46:12,494] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 14:46:12,514] INFO All actors active:
* Actor[akka://marathon/user/groupManager#1830533307]
* Actor[akka://marathon/user/launchQueue#-485770292]
* Actor[akka://marathon/user/killOverdueStagedTasks#540935184]
* Actor[akka://marathon/user/offersWantedForReconciliation#-876030651]
* Actor[akka://marathon/user/taskKillServiceActor#1912899513]
* Actor[akka://marathon/user/rateLimiter#1490654495]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-1930600924]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-405721147]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1467955826]
* Actor[akka://marathon/user/taskTracker#623538008]
* Actor[akka://marathon/user/offerMatcherManager#1760835888] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-13 14:46:12,520] INFO Started ServerConnector@7fc3545e{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,520] INFO Started @4026ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-13 14:46:12,521] INFO All services up and running. (mesosphere.marathon.Main$:main)
[2016-12-13 14:46:12,524] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-13 14:46:12,536] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-13 14:46:12,545] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:12,555] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-13T14:46:29.995Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-13 14:46:12,588] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,589] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,589] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,589] INFO => Schedule next revive at 2016-12-13T14:46:17.586Z in 4998 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,638] INFO Create new Scheduler Driver with frameworkId: Some(value: "44a35e16-dc32-4f91-afac-33dfff498944-0000"
) and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@15d42ccb (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1213 14:46:12.777942 192 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
I1213 14:46:12.791020 196 sched.cpp:226] Version: 1.0.1
2016-12-13 14:46:12,791:134(0x7f351b0c7700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-13 14:46:12,791:134(0x7f351b0c7700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@log_env@767: Client environment:user.dir=/marathon
2016-12-13 14:46:12,792:134(0x7f351b0c7700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f3522fb1d80 sessionId=0 sessionPasswd=<null> context=0x7f352403b5c0 flags=0
[2016-12-13 14:46:12,799] INFO Reset offerLeadership backoff (mesosphere.marathon.core.election.impl.ExponentialBackoff:pool-1-thread-1)
[2016-12-13 14:46:12,799] INFO Became active. Accepting event streaming requests. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-6)
[2016-12-13 14:46:12,800] INFO Starting scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-8)
2016-12-13 14:46:12,803:134(0x7f35198c4700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
[2016-12-13 14:46:12,811] INFO Scheduler actor ready (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-6)
2016-12-13 14:46:12,826:134(0x7f35198c4700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x158f4c9e1f90016, negotiated timeout=10000
I1213 14:46:12.828322 205 group.cpp:349] Group process (group(1)@127.0.0.1:35047) connected to ZooKeeper
I1213 14:46:12.829164 205 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1213 14:46:12.829583 205 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I1213 14:46:12.833498 205 detector.cpp:152] Detected a new leader: (id='1')
I1213 14:46:12.834187 205 group.cpp:706] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1213 14:46:12.836730 205 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
I1213 14:46:12.837736 205 sched.cpp:330] New master detected at master@127.0.0.1:5050
I1213 14:46:12.838312 205 sched.cpp:341] No credentials provided. Attempting to register without authentication
I1213 14:46:12.843353 205 sched.cpp:743] Framework registered with 44a35e16-dc32-4f91-afac-33dfff498944-0000
[2016-12-13 14:46:12,846] INFO Creating tombstone for old twitter commons leader election (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-13 14:46:12,872] INFO Registered as 44a35e16-dc32-4f91-afac-33dfff498944-0000 to master '03f81417-51c9-4055-adbe-c1fb74fc8ab4' (mesosphere.marathon.MarathonScheduler$$EnhancerByGuice$$1ef061b0:Thread-14)
[2016-12-13 14:46:12,872] INFO Store framework id: value: "44a35e16-dc32-4f91-afac-33dfff498944-0000"
 (mesosphere.util.state.FrameworkIdUtil:Thread-14)
[2016-12-13 14:46:12,895] INFO Received reviveOffers notification: SchedulerRegisteredEvent (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-13 14:46:16,658] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/deployments HTTP/1.1" 200 22 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 14 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-35)
[2016-12-13 14:46:16,660] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/groups?embed=group.groups&embed=group.apps&embed=group.apps.deployments&embed=group.apps.counts&embed=group.apps.readiness HTTP/1.1" 200 95 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 15 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-36)
[2016-12-13 14:46:16,666] INFO 10.0.2.2 - - [13/Dec/2016:14:46:16 +0000] "GET //localhost:7070/v2/queue HTTP/1.1" 200 32 "http://localhost:7070/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 15 (mesosphere.chaos.http.ChaosRequestLog$$EnhancerByGuice$$c1e74978:qtp1755811644-34)
[2016-12-13 14:46:17,603] INFO Received TimedCheck (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:17,604] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:17,609] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-13 14:46:17,612] INFO => Schedule next revive at 2016-12-13T14:46:22.602Z in 4998 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-8)

Let us connect to the Marathon portal:

We have chosen Marathon to run on port 7070. In my case, the Docker host VirtualBox VM has a NATed interface only, so I need to open the Marathon port on VirtualBox:

Now we can access the Marathon dashboard:

Let us see, whether Maraton is visible on Mesos:

Yes, the Marathon framework can be seen on the Mesos master; perfect.

Now, our topology now looks like follows:

However, we have not started any Marathon application yet. Let us do so now.

Step 6: Start a “Hello World” Application via Marathon Web Portal in a Mesos Container

Let us create an application running in a Mesos container on the Marathon web portal by clicking the  button.

We choose

  • ID: while-loop-hello-world
    • ID needs to consist of lowercase letters, digits, hyphens, “.”, “..”
  • CPUs: 0.1
    • If you keep the default of 1, then you might hit resource problems, if you have not 4 CPUs available for the 4 instances. In this case, one or more instances might be found in the “Waiting” state.
  • Memory: 32 MiB
    • 32 MiB is the minimum value supported
  • Disk Space: 0
  • Instances: 4
  • Command: while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
    • we have redirected the output of the script, because the STDERR/STDOUT retrieval does not work on the Marathon portal, currently (see Appendix A.5).

After less than 5 sec, we see, that all of the 4 instances are up and running:

On the Docker host, we can see the four instances running:

$ ps -ef | grep Hello | grep -v grep
root 14137 14095 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log

We also can find the log files on the Docker host:

$ sudo find / -name this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.927a5843-c517-11e6-ad85-02422b10e522/runs/0f079dcc-25e7-4ad6-b263-4761370a6173/this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.9285a2e4-c517-11e6-ad85-02422b10e522/runs/44a5c7dd-59f3-4b59-8d94-33754a3cbdc3/this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.92861815-c517-11e6-ad85-02422b10e522/runs/5400a1d9-582f-4d59-9b81-a719d21c60a1/this_is_my_output.log
/vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.9286db66-c517-11e6-ad85-02422b10e522/runs/f3145a9f-2c88-4f99-a8de-618a4157b9e8/this_is_my_output.log

And we can see the output:

$ sudo tail -F /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.927a5843-c517-11e6-ad85-02422b10e522/runs/0f079dcc-25e7-4ad6-b263-4761370a6173/this_is_my_output.log
I am a Hello World script
I am a Hello World script
I am a Hello World script
I am a Hello World script
I am a Hello World script
...

Step 7: Test of Application Resiliency

Step 7.1: Killing a Process and Testing, whether it was restarted automatically

Now, let us see, what happens, if one of the processes dies. For that, we just will kill one of the “Hello World” processes found on the Docker host:

(dockerhost)$ ps -ef | grep Hello | grep -v grep
root 14137 14095 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
(dockerhost)$ sudo kill -9 14137
(dockerhost)$ ps -ef | grep Hello | grep -v grep
root 14138 14096 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14141 14116 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 14142 14127 0 11:46 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log
root 19818 19807 1 12:04 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done 2>&1 1> this_is_my_output.log

We can clearly see that the process was killed successfully, and a new process was started automatically.

Step 7.2: Find Traces of the killed Process in the Logs

The output log of the old process can still be retrieved:

$ cat -F /vagrant/jenkins_home/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33
dfff498944-0000/executors/while-loop-hello-world.92861815-c517-11e6-ad85-02422b10e522/runs/5400a1d9-582f-4d59-9b81-a719d21c60a1/this_is_my_output.log
...
I am a Hello World script
I am a Hello World script
I am a Hello World script
I am a Hello World script
(EOF)

However, we cannot see any traces that something went wrong. So, how can we troubleshoot, if a process dies unexpectedly?

Here, we are: the Marathon portal is providing us with information on what happened:

Step 8: Test: Resource Over-subsription not supported per Default?

In this test, we observe, what happens, if an application instance is exceeding the offered resource limits. For that , we will try to start four instances with 1 CPU core each, while the slave provides 2 CPUs and not the required 4 CPUs:

We press  again and see that 2 of the 4 applications are started soon:

Two of the task instances were started successfully, and two instances are waiting for resources and are “Unscheduled”. This is the expected behavior:

On the Mesos Dashboard, we can see that the offered 2 CPUs are used by two instances already, leaving no CPU resources for the other two instances:

Aurora vs. Marathon: Consider a situation, where you want to run high priority production applications with low priority development applications on the same hardware. Mesosphere Marathon does not provide any good answer to such situations. In case you hit such a situation, consider to using Apache Aurora instead of Mesosphere Marathon. Apache Aurora allows high priority application to preempt low priority applications.

Mesos: about hard-limit of CPU resources: Mesos’ CPU reservation feels similar to a hard assignment of resources (since over-subscription is not supported per default, as we have seen above), but under the hood, Mesos does not apply hard limits on the CPU usage, unless the Mesos slave has --cgroups_enable_cfs  (CFS = Completely Fair Scheduler) enabled. See also the second answer on this StackOverflow question. For more information on over-subscription by Mesos, see this Apache Mesos Documentation page.

Conclusion: Per default, resource over-subscription is not allowed by Mesos. See this Apache Mesos Documentation page on more information about over-subscription.

Step 9: Start a “Hello World” Application via Marathon Web Portal in a Docker Container

In this step we learn how to run an application in a Docker container instead of a Mesos container. We will perform similar steps as in Step 6.

We choose

  • ID: while-loop-hello-world-container-small
    • ID needs to consist of lowercase letters, digits, hyphens, “.”, “..”
  • CPUs: 0.1
    • If you keep the default of 1, then you might hit resource problems, if you have not 4 CPUs available for the 4 instances. In this case, one or more instances might be found in the “Waiting” state.
  • Memory: 32 MiB
    • 32 MiB is the minimum value supported
  • Disk Space: 0
  • Instances: 4
  • Command: while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
    • we have redirected the output of the script, because the STDERR/STDOUT retrieval does not work on the Marathon portal, currently (see Appendix A.5).
  • Docker: ubuntu:latest

We click “Create Application”

Choose a low number of CPUs (0.1 CPUs in my case), the lowest number of Memory and four instances, together with a while loop as command:

In the Docker Container tab, we choose an Ubuntu image.

First, we see “waiting”,

then we see that all four instances are “running”:

On the Docker host, we can see the four Docker containers that have been started:

$ docker ps
CONTAINER ID        IMAGE                                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
78e346ddb3eb        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724
809ad6513236        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b
de7523c00d41        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9
d3729dad1bed        ubuntu:latest                                           "/bin/sh -c 'while tr"   About an hour ago   Up About an hour                        mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718

The processes are:

$ ps -ef | grep Hello | grep -v grep
root     19306 19258  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31730 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27ecd8a-c52f-11e6-ad85-02422b10e522 -e PORT=31730 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31730 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31730 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27ecd8a-c52f-11e6-ad85-02422b10e522/runs/e924e32b-521b-4708-b538-bc2973094718:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e924e32b-521b-4708-b538-bc2973094718 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19319 19280  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31873 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27e5859-c52f-11e6-ad85-02422b10e522 -e PORT=31873 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31873 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31873 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27e5859-c52f-11e6-ad85-02422b10e522/runs/c42d9e0c-f66e-42be-ac0e-46bc8f44e47b:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.c42d9e0c-f66e-42be-ac0e-46bc8f44e47b ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19320 19281  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31245 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f2792838-c52f-11e6-ad85-02422b10e522 -e PORT=31245 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31245 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31245 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f2792838-c52f-11e6-ad85-02422b10e522/runs/e7fcebfb-f301-4369-83eb-2782832d83a9:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.e7fcebfb-f301-4369-83eb-2782832d83a9 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19368 19352  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19379 19334  0 14:40 ?        00:00:00 docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-12-18T14:40:40.292Z -e HOST=openshift-installer-native-docker-compose -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=ubuntu:latest -e PORT_10000=31928 -e MESOS_TASK_ID=while-loop-hello-world-container-small.f27f90db-c52f-11e6-ad85-02422b10e522 -e PORT=31928 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31928 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/while-loop-hello-world-container-small -e PORT0=31928 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 -v /var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world-container-small.f27f90db-c52f-11e6-ad85-02422b10e522/runs/13aeacaa-20d5-4628-8e65-f82a0fc71724:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0.13aeacaa-20d5-4628-8e65-f82a0fc71724 ubuntu:latest -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19422 19406  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19443 19427  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log
root     19473 19457  0 14:40 ?        00:00:01 /bin/sh -c while true; do echo "I am a Hello World script in a Docker container"; sleep 1; done 2>&1 1> this_is_my_container_output.log

We can see that there are four docker run commands and four while loops running. We had assigned 0.1 CPUs to each application instance. This has been translated into a CPU share of 1024/10 = ~102.

Note, that Docker does not allow to hard-limit the used CPU per default, if --cgroups_enable_cfs (CFS) is not set on the Docker run command. Instead, CPU shares are configured for the Docker container.

Appendix A: Errors & Caveats

Appendix A.1: Mesos Portal Error: Failed to connect to 127.0.0.1:5050!

Problem:

After starting the Mesos Portal in a Browser, we see following symptom:

After clicking “Try now”, the problem shows up immediately again. After clicking the Browser’s page reload button, it looks better, but the problem will show up soon again.

Status: Open

I have not found any solution yet. I have not yet tested any older version.

Appendix A.2: Critical Error: Mesos Master: Lost leadership… committing suicide!

  • Mesos master 1.1.0 run in a Docker container (image mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404)
  • ZooKeeper 1.5.2 run in a Docker container (image netflixoss/exhibitor:1.5.2)

Symptoms:

From time to time, we get following critical error log of the Mesos Master:

...
2016-12-18 08:15:59,593:22(0x7f18037fe700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 31198753ms
2016-12-18 08:15:59,595:22(0x7f1820ef8700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 31198755ms
2016-12-18 08:15:59,597:22(0x7f1820ef8700):ZOO_ERROR@handle_socket_error_msg@1735: Socket [127.0.0.1:2181] zk retcode=-4, errno=32(Broken pipe): failed while flushing send queue
2016-12-18 08:15:59,597:22(0x7f18037fe700):ZOO_ERROR@handle_socket_error_msg@1746: Socket [127.0.0.1:2181] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2016-12-18 08:15:59,599:22(0x7f1820ef8700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
I1218 08:15:59.601364 25 group.cpp:451] Lost connection to ZooKeeper, attempting to reconnect ...
I1218 08:15:59.602128 26 group.cpp:451] Lost connection to ZooKeeper, attempting to reconnect ...
2016-12-18 08:15:59,604:22(0x7f1820ef8700):ZOO_ERROR@handle_socket_error_msg@1764: Socket [127.0.0.1:2181] zk retcode=-112, errno=116(Stale file handle): sessionId=0x158f4c9e1f90035 has expired.
I1218 08:15:59.605609 28 group.cpp:510] ZooKeeper session expired
I1218 08:15:59.606210 29 contender.cpp:217] Membership cancelled: 4
2016-12-18 08:15:59,606:22(0x7f1823737700):ZOO_INFO@zookeeper_close@2543: Freeing zookeeper resources for sessionId=0x158f4c9e1f90035

Lost leadership... committing suicide!
2016-12-18 08:15:59,607:22(0x7f182573b700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@730: Client environment:host.name=openshift-installer
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@738: Client environment:os.arch=4.2.0-42-generic
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@739: Client environment:os.version=#49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-12-18 08:15:59,608:22(0x7f182573b700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-18 08:15:59,609:22(0x7f182573b700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-12-18 08:15:59,609:22(0x7f182573b700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 watcher=0x7f182e49c200 sessionId=0 sessionPasswd=<null> context=0x7f1810008470 flags=0
2016-12-18 08:15:59,609:22(0x7f1803fff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
(container)#

Resolution: None

I have found a similar issue for Marathon here. Both problems seem to be caused by ZooKeeper problems, but it is not clear, how to resolve the issue.

Appendix A.3: ZooKeeper: continuous warnings ‘Exceeded deadline’ and ‘Current disk usage’

After starting the ZooKeeper, I continuously see log messages like follows:

2016-12-18 10:24:14,664:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 77ms
I1218 10:24:58.446185 15 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
I1218 10:25:58.450482 14 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
2016-12-18 10:26:21,523:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 26ms
I1218 10:26:58.454617 9 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
2016-12-18 10:27:58,326:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 15ms
I1218 10:27:58.458858 12 slave.cpp:5044] Current disk usage 62.33%. Max allowed age: 1.936647380411088days
2016-12-18 10:28:01,681:1(0x7fb7abfff700):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 20ms

Resolution: None

Since the warnings do not seem to be critical, I have not yet dug into the problem.

Appendix A.4: Marathon Portal Error: no Reaction upon Restart Request

Symptoms:

If we try to restart an application, there is no reaction whatsoever:

Running ps -ef on the Docker host yields the same process numbers before and after pressing the Restart button:

(dockerhost)$ ps -ef | grep while | grep -v grep
root 13142 29300 0 19:54 ? 00:00:00 mesos-containerizer launch --command={"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"} --environment={"HOST":"openshift-installer-native-docker-compose","LIBPROCESS_PORT":"0","MARATHON_APP_ID":"\/while-loop-hello-world","MARATHON_APP_LABELS":"","MARATHON_APP_RESOURCE_CPUS":"1.0","MARATHON_APP_RESOURCE_DISK":"0.0","MARATHON_APP_RESOURCE_GPUS":"0","MARATHON_APP_RESOURCE_MEM":"32.0","MARATHON_APP_VERSION":"2016-12-16T19:54:30.939Z","MESOS_AGENT_ENDPOINT":"127.0.0.1:5051","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522\/runs\/a67b861b-c20c-4d94-8db5-e956b3dea8ab","MESOS_EXECUTOR_ID":"while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"44a35e16-dc32-4f91-afac-33dfff498944-0000","MESOS_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522\/runs\/a67b861b-c20c-4d94-8db5-e956b3dea8ab","MESOS_SLAVE_ID":"a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0","MESOS_SLAVE_PID":"slave(1)@127.0.0.1:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","MESOS_TASK_ID":"while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin","PORT":"31690","PORT0":"31690","PORTS":"31690","PORT_10000":"31690"} --help=false --pipe_read=15 --pipe_write=16 --pre_exec_commands=[] --runtime_directory=/var/run/mesos/containers/a67b861b-c20c-4d94-8db5-e956b3dea8ab --unshare_namespace_mnt=false --working_directory=/var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.75d64c28-c3c9-11e6-9f91-02422b10e522/runs/a67b861b-c20c-4d94-8db5-e956b3dea8ab
root 13144 29300 0 19:54 ? 00:00:00 mesos-containerizer launch --command={"arguments":["mesos-executor","--launcher_dir=\/usr\/libexec\/mesos"],"shell":false,"value":"\/usr\/libexec\/mesos\/mesos-executor"} --environment={"HOST":"openshift-installer-native-docker-compose","LIBPROCESS_PORT":"0","MARATHON_APP_ID":"\/while-loop-hello-world","MARATHON_APP_LABELS":"","MARATHON_APP_RESOURCE_CPUS":"1.0","MARATHON_APP_RESOURCE_DISK":"0.0","MARATHON_APP_RESOURCE_GPUS":"0","MARATHON_APP_RESOURCE_MEM":"32.0","MARATHON_APP_VERSION":"2016-12-16T19:54:30.939Z","MESOS_AGENT_ENDPOINT":"127.0.0.1:5051","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522\/runs\/cb6ec83f-a228-47f5-ad06-ee6675134bb0","MESOS_EXECUTOR_ID":"while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"44a35e16-dc32-4f91-afac-33dfff498944-0000","MESOS_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/lib\/libmesos-1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/tmp\/mesos\/slaves\/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0\/frameworks\/44a35e16-dc32-4f91-afac-33dfff498944-0000\/executors\/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522\/runs\/cb6ec83f-a228-47f5-ad06-ee6675134bb0","MESOS_SLAVE_ID":"a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0","MESOS_SLAVE_PID":"slave(1)@127.0.0.1:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","MESOS_TASK_ID":"while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin","PORT":"31996","PORT0":"31996","PORTS":"31996","PORT_10000":"31996"} --help=false --pipe_read=15 --pipe_write=16 --pre_exec_commands=[] --runtime_directory=/var/run/mesos/containers/cb6ec83f-a228-47f5-ad06-ee6675134bb0 --unshare_namespace_mnt=false --working_directory=/var/tmp/mesos/slaves/a06268a0-4cfd-4d76-94c7-fbdf053be0ba-S0/frameworks/44a35e16-dc32-4f91-afac-33dfff498944-0000/executors/while-loop-hello-world.75cab367-c3c9-11e6-9f91-02422b10e522/runs/cb6ec83f-a228-47f5-ad06-ee6675134bb0
root 13164 13143 0 19:54 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done
root 13165 13154 0 19:54 ? 00:00:00 sh -c while true; do echo "I am a Hello World script"; sleep 1; done

Status: Workaround given

The workaround is to destroy the application and run a new application with the exact same parameters.

Appendix A.5: Marathon Portal Error: Retrieval of STDERR and STDOUT fails

Seen on Marathon 1.3.6 with both, a shell script on Mesos container as well as with a shell script on a Docker container.

Symptoms:

If you try to retrieve the error log of output of a hello world application on the Marathon portal, we get the error message “Sorry, there was a problem retrieving file. Click to retry.” Retrying does not help.

Status: Open (idea for a workaround given below)

A possible workaround is to run the script with redirection into a file you can retrieve later, E.g. we could define a script like

while true; do echo "I am a Hello World script" 2>&1 1> this_is_my_output.log

After having found the working directory of Mesos-started scripts, you can retrieve the log file from the slave’s file system.

Appendix A.6: Critical Marathon Error: Flag 'work_dir' is required, but it was not provided

Marathon 1.3.6 run as Docker container (mesosphere/marathon:latest; image ID 9d03a8fd0fdd)

Symptoms

When we try to start Marathon as seen below, the attempt fails with the exception:

Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided

Full log:

(dockerhost)$ sudo docker run -it --net=host -v `pwd`:/work_dir --entrypoint=bash mesosphere/marathon
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-12 21:01:46,268] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-12 21:01:46,588] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-12 21:01:46,899] INFO Logging initialized @1979ms (org.eclipse.jetty.util.log:main)
[2016-12-12 21:01:47,322] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:01:47,517] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-12 21:01:47,579] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-12 21:01:47,657] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,662] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,665] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,666] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:01:47,668] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-12 21:01:47,779] INFO All actors suspended:
* Actor[akka://marathon/user/expungeOverdueLostTasks#1718670650]
* Actor[akka://marathon/user/rateLimiter#1278378489]
* Actor[akka://marathon/user/groupManager#890512610]
* Actor[akka://marathon/user/taskTracker#-11699813]
* Actor[akka://marathon/user/launchQueue#1496971565]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-557182315]
* Actor[akka://marathon/user/killOverdueStagedTasks#-1648400379]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1003103868]
* Actor[akka://marathon/user/offerMatcherManager#219115497]
* Actor[akka://marathon/user/offersWantedForReconciliation#-1104494480]
* Actor[akka://marathon/user/taskKillServiceActor#-521724399] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-12 21:01:47,891] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:01:47,892] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:01:47,971] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:01:47,972] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:01:47,974] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:01:47,983] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,018] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,229] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,233] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,234] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,238] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,311] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,328] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:01:48,527] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,528] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,553] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,565] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,576] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,577] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Client environment:os.memory.free=121MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Client environment:os.memory.total=157MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,578] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@7531678e (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:01:48,623] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:01:48,633] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:01:48,653] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90004, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:01:48,661] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-12 21:01:48,722] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-12 21:01:48,723] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:01:48,734] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@5ca52bc0 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-12 21:01:48,748] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:01:48,749] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:01:48,755] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90005, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:01:48,854] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-7)
[2016-12-12 21:01:48,854] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:01:48,868] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:01:48,885] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,893] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:01:48,897] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,900] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-3)
[2016-12-12 21:01:48,901] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-3)
[2016-12-12 21:01:48,902] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:01:48,911] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-12 21:01:48,914] INFO Create new Scheduler Driver with frameworkId: None and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@15d42ccb (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
[2016-12-12 21:01:48,924] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,925] INFO All actors active:
* Actor[akka://marathon/user/expungeOverdueLostTasks#1718670650]
* Actor[akka://marathon/user/rateLimiter#1278378489]
* Actor[akka://marathon/user/groupManager#890512610]
* Actor[akka://marathon/user/taskTracker#-11699813]
* Actor[akka://marathon/user/launchQueue#1496971565]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#-557182315]
* Actor[akka://marathon/user/killOverdueStagedTasks#-1648400379]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1003103868]
* Actor[akka://marathon/user/offerMatcherManager#219115497]
* Actor[akka://marathon/user/offersWantedForReconciliation#-1104494480]
* Actor[akka://marathon/user/taskKillServiceActor#-521724399] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-8)
[2016-12-12 21:01:48,994] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-12 21:01:48,994] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-7)
[2016-12-12 21:01:48,995] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-12T21:02:06.407Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:01:49,008] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-12 21:01:49,008] INFO => Schedule next revive at 2016-12-12T21:01:53.993Z in 4986 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-5)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1212 21:01:49.134382 63 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
I1212 21:01:49.179211 63 leveldb.cpp:174] Opened db in 40.639535ms
I1212 21:01:49.188115 63 leveldb.cpp:181] Compacted db in 8.801601ms
I1212 21:01:49.188176 63 leveldb.cpp:196] Created db iterator in 11072ns
I1212 21:01:49.188189 63 leveldb.cpp:202] Seeked to beginning of db in 758ns
I1212 21:01:49.188194 63 leveldb.cpp:271] Iterated through 0 keys in the db in 202ns
I1212 21:01:49.188232 63 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1212 21:01:49.189082 70 recover.cpp:451] Starting replica recovery
I1212 21:01:49.189720 70 recover.cpp:477] Replica is in EMPTY status
I1212 21:01:49.189486 73 master.cpp:375] Master 8be08601-c962-42fa-9f78-7beda337b644 (openshift-installer-native-docker-compose) started on 127.0.0.1:35322
I1212 21:01:49.190399 73 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/9Y1S6q" --zk_session_timeout="10secs"
W1212 21:01:49.190558 73 master.cpp:380]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 21:01:49.190712 73 master.cpp:429] Master allowing unauthenticated frameworks to register
I1212 21:01:49.191285 73 master.cpp:443] Master allowing unauthenticated agents to register
I1212 21:01:49.191337 73 master.cpp:457] Master allowing HTTP frameworks to register without authentication
I1212 21:01:49.191366 73 master.cpp:499] Using default 'crammd5' authenticator
Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided
W1212 21:01:49.192327 73 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1212 21:01:49.192360 73 authenticator.cpp:519] Initializing server SASL
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 --work_dir
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[scallop] Error: Unknown option 'work_dir'
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 -work_dir=/work_dir
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[scallop] Error: Unknown option 'w'
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070 --work_dir=/work_dir
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[scallop] Error: Unknown option 'work_dir=/work_dir'
root@openshift-installer:/marathon# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
No start hook file found ($HOOK_MARATHON_START). Proceeding with the start script.
[2016-12-12 21:05:35,546] INFO Starting Marathon 1.3.6/unknown with --master local --zk zk://localhost:2181/marathon --http_port=7070 (mesosphere.marathon.Main$:main)
[2016-12-12 21:05:35,823] WARN Method [public javax.ws.rs.core.Response mesosphere.marathon.api.MarathonExceptionMapper.toResponse(java.lang.Throwable)] is synthetic and is being intercepted by [mesosphere.marathon.DebugModule$MetricsBehavior@985696]. This could indicate a bug. The method may be intercepted twice, or may not be intercepted at all. (com.google.inject.internal.ProxyFactory:main)
[2016-12-12 21:05:36,115] INFO Logging initialized @1856ms (org.eclipse.jetty.util.log:main)
[2016-12-12 21:05:36,621] INFO Slf4jLogger started (akka.event.slf4j.Slf4jLogger:marathon-akka.actor.default-dispatcher-4)
[2016-12-12 21:05:36,800] INFO Started TaskTrackerUpdateStepsProcessorImpl with steps:
* continueOnError(notifyHealthCheckManager)
* continueOnError(notifyRateLimiter)
* continueOnError(notifyLaunchQueue)
* continueOnError(emitUpdate)
* continueOnError(postTaskStatusEvent)
* continueOnError(scaleApp) (mesosphere.marathon.core.task.tracker.impl.TaskTrackerUpdateStepProcessorImpl:main)
[2016-12-12 21:05:36,864] INFO Calling reviveOffers is enabled. Use --disable_revive_offers_for_new_apps to disable. (mesosphere.marathon.core.flow.FlowModule:main)
[2016-12-12 21:05:36,930] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authenticator' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,936] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,940] INFO Loading plugins implementing 'mesosphere.marathon.plugin.auth.Authorizer' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,940] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:main)
[2016-12-12 21:05:36,942] INFO Started status update processor (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl$$EnhancerByGuice$$ca7c928f:main)
[2016-12-12 21:05:37,047] INFO All actors suspended:
* Actor[akka://marathon/user/taskKillServiceActor#-984511042]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#671912224]
* Actor[akka://marathon/user/killOverdueStagedTasks#-698941106]
* Actor[akka://marathon/user/groupManager#-958454684]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1917290502]
* Actor[akka://marathon/user/offerMatcherManager#2056148696]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-308686880]
* Actor[akka://marathon/user/rateLimiter#-1808696027]
* Actor[akka://marathon/user/offersWantedForReconciliation#362759878]
* Actor[akka://marathon/user/launchQueue#-1787605975]
* Actor[akka://marathon/user/taskTracker#1265821735] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-5)
[2016-12-12 21:05:37,115] INFO Adding HTTP support. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:05:37,116] INFO No HTTPS support configured. (mesosphere.chaos.http.HttpModule:main)
[2016-12-12 21:05:37,174] INFO Starting up (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:05:37,181] INFO jetty-9.3.z-SNAPSHOT (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,185] INFO Beginning run (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:05:37,188] INFO Will offer leadership after 500 milliseconds backoff (mesosphere.marathon.core.election.impl.CuratorElectionService:MarathonSchedulerService$$EnhancerByGuice$$4bb98838)
[2016-12-12 21:05:37,219] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:05:37,416] INFO Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,425] INFO Registering mesosphere.marathon.api.MarathonExceptionMapper as a provider class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,425] INFO Registering mesosphere.marathon.api.v2.AppsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,426] INFO Registering mesosphere.marathon.api.v2.TasksResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,428] INFO Registering mesosphere.marathon.api.v2.EventSubscriptionsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,429] INFO Registering mesosphere.marathon.api.v2.QueueResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,430] INFO Registering mesosphere.marathon.api.v2.GroupsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,431] INFO Registering mesosphere.marathon.api.v2.InfoResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,432] INFO Registering mesosphere.marathon.api.v2.LeaderResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,432] INFO Registering mesosphere.marathon.api.v2.DeploymentsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,433] INFO Registering mesosphere.marathon.api.v2.ArtifactsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,433] INFO Registering mesosphere.marathon.api.v2.SchemaResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,434] INFO Registering mesosphere.marathon.api.v2.PluginsResource as a root resource class (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,437] INFO Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' (com.sun.jersey.server.impl.application.WebApplicationImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,491] INFO Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,508] INFO Binding mesosphere.marathon.api.MarathonExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:37,717] INFO Using HA and therefore offering leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,719] INFO Will do leader election through localhost:2181 (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,775] WARN session timeout [10000] is less than connection timeout [15000] (org.apache.curator.CuratorZookeeperClient:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,840] INFO Starting (org.apache.curator.framework.imps.CuratorFrameworkImpl:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,857] INFO Client environment:zookeeper.version=3.5.0-alpha-1615249, built on 08/01/2014 22:13 GMT (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,863] INFO Client environment:host.name=openshift-installer-native-docker-compose (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,864] INFO Client environment:java.version=1.8.0_102 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,864] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,865] INFO Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,866] INFO Client environment:java.class.path=./bin/../target/marathon-assembly-1.3.6.jar (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,866] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,867] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,867] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,868] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,868] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,869] INFO Client environment:os.version=4.2.0-42-generic (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,869] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,870] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,870] INFO Client environment:user.dir=/marathon (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,870] INFO Client environment:os.memory.free=128MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,871] INFO Client environment:os.memory.max=880MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,871] INFO Client environment:os.memory.total=165MB (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,872] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@2fba82d1 (org.apache.zookeeper.ZooKeeper:ForkJoinPool-2-worker-13)
[2016-12-12 21:05:37,922] INFO Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:05:37,939] INFO Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:05:37,971] INFO Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x158f4c9e1f90006, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:ForkJoinPool-2-worker-13-SendThread(localhost:2181))
[2016-12-12 21:05:37,978] INFO State change: CONNECTED (org.apache.curator.framework.state.ConnectionStateManager:ForkJoinPool-2-worker-13-EventThread)
[2016-12-12 21:05:38,049] INFO Elected (LeaderLatchListener Interface) (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1)
[2016-12-12 21:05:38,052] INFO As new leader running the driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:05:38,075] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=com.twitter.zk.EventBroker@2c6c37f7 (org.apache.zookeeper.ZooKeeper:pool-1-thread-1)
[2016-12-12 21:05:38,082] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:05:38,084] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:05:38,088] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x158f4c9e1f90007, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(localhost:2181))
[2016-12-12 21:05:38,174] INFO Migration successfully applied for version Version(1, 3, 6) (mesosphere.marathon.state.Migration:ForkJoinPool-2-worker-7)
[2016-12-12 21:05:38,176] INFO Call preDriverStarts callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCache(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(events:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:05:38,188] INFO Finished preDriverStarts callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$4bb98838:pool-1-thread-1)
[2016-12-12 21:05:38,195] INFO no interest in offers for reservation reconciliation anymore. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-12 21:05:38,200] INFO Started. Will remain interested in offer reconciliation for 17500 milliseconds when needed. (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-12 21:05:38,208] INFO started RateLimiterActor (mesosphere.marathon.core.launchqueue.impl.RateLimiterActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:05:38,209] INFO ExpungeOverdueLostTasksActor has started (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-2)
[2016-12-12 21:05:38,216] INFO TaskTrackerActor is starting. Task loading initiated. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,222] INFO Create new Scheduler Driver with frameworkId: None and scheduler mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$EnhancerByGuice$$c730b9b8@46383a78 (mesosphere.marathon.MarathonSchedulerDriver$:pool-1-thread-1)
[2016-12-12 21:05:38,224] INFO All actors active:
* Actor[akka://marathon/user/taskKillServiceActor#-984511042]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#671912224]
* Actor[akka://marathon/user/killOverdueStagedTasks#-698941106]
* Actor[akka://marathon/user/groupManager#-958454684]
* Actor[akka://marathon/user/reviveOffersWhenWanted#1917290502]
* Actor[akka://marathon/user/offerMatcherManager#2056148696]
* Actor[akka://marathon/user/expungeOverdueLostTasks#-308686880]
* Actor[akka://marathon/user/rateLimiter#-1808696027]
* Actor[akka://marathon/user/offersWantedForReconciliation#362759878]
* Actor[akka://marathon/user/launchQueue#-1787605975]
* Actor[akka://marathon/user/taskTracker#1265821735] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-4)
[2016-12-12 21:05:38,230] INFO About to load 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-12 21:05:38,239] INFO Loaded 0 tasks (mesosphere.marathon.core.task.tracker.impl.TaskLoaderImpl:ForkJoinPool-2-worker-5)
[2016-12-12 21:05:38,257] INFO Task loading complete. (mesosphere.marathon.core.task.tracker.impl.TaskTrackerActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,284] INFO interested in offers for reservation reconciliation because of becoming leader (until 2016-12-12T21:05:55.712Z) (mesosphere.marathon.core.matcher.reconcile.impl.OffersWantedForReconciliationActor:marathon-akka.actor.default-dispatcher-9)
[2016-12-12 21:05:38,322] INFO Binding mesosphere.marathon.api.v2.AppsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,333] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,335] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,338] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,339] INFO => Schedule next revive at 2016-12-12T21:05:43.333Z in 4995 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,341] INFO Received offers WANTED notification (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-3)
[2016-12-12 21:05:38,356] INFO Binding mesosphere.marathon.api.v2.TasksResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,359] INFO Binding mesosphere.marathon.api.v2.EventSubscriptionsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,361] INFO Event notification disabled. (mesosphere.marathon.core.event.EventModule:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,375] INFO Binding mesosphere.marathon.api.v2.QueueResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,385] INFO Binding mesosphere.marathon.api.v2.GroupsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,390] INFO Binding mesosphere.marathon.api.v2.InfoResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,396] INFO Binding mesosphere.marathon.api.v2.LeaderResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,400] INFO Binding mesosphere.marathon.api.v2.DeploymentsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,418] INFO Binding mesosphere.marathon.api.v2.ArtifactsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,423] INFO Binding mesosphere.marathon.api.v2.SchemaResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,428] INFO Binding mesosphere.marathon.api.v2.PluginsResource to GuiceManagedComponentProvider with the scope "Singleton" (com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,429] INFO Loading plugins implementing 'mesosphere.marathon.plugin.http.HttpRequestHandler' from these urls: [] (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,431] INFO Found 0 plugins. (mesosphere.marathon.core.plugin.impl.PluginManagerImpl:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,452] INFO Started o.e.j.s.ServletContextHandler@33ffb91e{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1212 21:05:38.488497 223 sched.cpp:1697]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with remote master(s). You might want to set 'LIBPROCESS_IP' environment variable to use a routable IP address.
**************************************************
[2016-12-12 21:05:38,494] INFO Started ServerConnector@23580056{HTTP/1.1,[http/1.1]}{0.0.0.0:7070} (org.eclipse.jetty.server.ServerConnector:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,496] INFO Started @4237ms (org.eclipse.jetty.server.Server:HttpService$$EnhancerByGuice$$1fd57a30 STARTING)
[2016-12-12 21:05:38,497] INFO All services up and running. (mesosphere.marathon.Main$:main)
I1212 21:05:38.524034 223 leveldb.cpp:174] Opened db in 33.764878ms
I1212 21:05:38.525073 223 leveldb.cpp:181] Compacted db in 985356ns
I1212 21:05:38.525146 223 leveldb.cpp:196] Created db iterator in 9599ns
I1212 21:05:38.525156 223 leveldb.cpp:202] Seeked to beginning of db in 501ns
I1212 21:05:38.525161 223 leveldb.cpp:271] Iterated through 0 keys in the db in 202ns
I1212 21:05:38.525203 223 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1212 21:05:38.526582 229 recover.cpp:451] Starting replica recovery
I1212 21:05:38.527250 229 recover.cpp:477] Replica is in EMPTY status
I1212 21:05:38.528532 229 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4)@127.0.0.1:33899
I1212 21:05:38.528669 229 recover.cpp:197] Received a recover response from a replica in EMPTY status
I1212 21:05:38.528774 229 recover.cpp:568] Updating replica status to STARTING
I1212 21:05:38.529911 232 master.cpp:375] Master e999af40-d59e-4d95-a77a-05403042ca4f (openshift-installer-native-docker-compose) started on 127.0.0.1:33899
I1212 21:05:38.530508 232 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/X38lNg" --zk_session_timeout="10secs"
W1212 21:05:38.531312 232 master.cpp:380]
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I1212 21:05:38.532498 232 master.cpp:429] Master allowing unauthenticated frameworks to register
I1212 21:05:38.532708 232 master.cpp:443] Master allowing unauthenticated agents to register
I1212 21:05:38.532913 232 master.cpp:457] Master allowing HTTP frameworks to register without authentication
I1212 21:05:38.533169 232 master.cpp:499] Using default 'crammd5' authenticator
W1212 21:05:38.533359 232 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1212 21:05:38.533457 232 authenticator.cpp:519] Initializing server SASL
I1212 21:05:38.532569 229 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 3.159782ms
I1212 21:05:38.534039 229 replica.cpp:320] Persisted replica status to STARTING
I1212 21:05:38.534257 229 recover.cpp:477] Replica is in STARTING status
I1212 21:05:38.534668 229 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (5)@127.0.0.1:33899
I1212 21:05:38.535151 229 recover.cpp:197] Received a recover response from a replica in STARTING status
I1212 21:05:38.540293 229 recover.cpp:568] Updating replica status to VOTING
Failed to start a local cluster while loading agent flags from the environment: Flag 'work_dir' is required, but it was not provided
(container):/marathon#

 

This error message is described in the documentation of the used Docker image as follows:

“Note: Currently the Docker container fails due to strange behavior from the latest Mesos version. There will be an error about work_dir that is still unresolved”

Unfortunately they did not provide any workaround or solution for the problem.

Resolution (Workaround):

I have found a workaround here: “It seems like explicitly setting the Mesos work directory by adding ENV MESOS_WORK_DIR /var/lib/mesos to the Dockerfile resolves the issue.”

I.e., we need to set the MESOS_WORK_DIR variable:

(dockerhost)$ sudo docker run -it --name marathon --rm --net=host -e MESOS_WORK_DIR=/var/lib/mesos --entrypoint=bash mesosphere/marathon
(container)# ./bin/start --master local --zk zk://localhost:2181/marathon --http_port=7070<