This is a cheat sheet on how to perform backup&restore of the etcd server in kubernetes quickly.

Test this on

tl;dr

Find reference: https://kubernetes.io –> Documentation –> Search „etcd backup restore“

–> you will find: Operating etcd clusters for Kubernetes | Kubernetes

# get params
cat /var/lib/kubelet/config.yaml | grep static
cat /etc/kubernetes/manifests/etcd.yaml | grep -A 30 command | egrep '^ *-'

# create test pod (optional)
k run test --image nginx
k get pod

# backup (get params from output above):
ETCDCTL_API=3 etcdctl --endpoints=https://172.30.1.2:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save snapshotdb

# delete test pod (optional)
k delete pod test
k get pod

# stop kube-apiserver and etcd server:
mv /etc/kubernetes/manifests/kube-apiserver.yaml ./;
mv /etc/kubernetes/manifests/etcd.yaml ./;

# restore:
rm -rf /var/lib/etcd;
ETCDCTL_API=3 etcdctl snapshot restore --data-dir=/var/lib/etcd snapshotdb

# start etcd server and kube-apiserver:
mv etcd.yaml /etc/kubernetes/manifests/;
mv kube-apiserver.yaml /etc/kubernetes/manifests/

# check (optional; wait for 2 minutes or so to allow etcd and kube-apiser to get up)
k get pod

Step 1 (optional): Create a test POD

k run test-pod --image nginx
k get pod

# output (if you wait a minute or so):
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 3m40s

Step 2: Get parameters from the etcd.yaml file

cat /var/lib/kubelet/config.yaml | grep static 

# output:
staticPodPath: /etc/kubernetes/manifests

Use the staticPodPath to find the POD manifests for etcd (and kube-apiserver further down below):

cat /etc/kubernetes/manifests/etcd.yaml | grep -A 30 command | egrep '^ *-'

# output:
 - command:
- etcd
- --advertise-client-urls=https://172.30.1.2:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt <----------------- cert-file
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://172.30.1.2:2380
- --initial-cluster=controlplane=https://172.30.1.2:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key <-------------------- key-file
- --listen-client-urls=https://127.0.0.1:2379,https://172.30.1.2:2379  <--- endpoints
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://172.30.1.2:2380
- --name=controlplane
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt <-------- trusted-ca-file
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Step 3: Create an etcd Backup

You need to replace the endpoints, cacert, cert and key by the parameters found in step 2:

ETCDCTL_API=3 etcdctl --endpoints=https://172.30.1.2:2379   --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key   snapshot save etcd-backup

# output:
{"level":"info","ts":1669193343.0635,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"snapshotdb.part"}
{"level":"info","ts":1669193343.0680363,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1669193343.068161,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"https://172.30.1.2:2379"}
{"level":"info","ts":1669193343.1282105,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1669193343.1477027,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"https://172.30.1.2:2379","size":"5.7 MB","took":"now"}
{"level":"info","ts":1669193343.148174,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"snapshotdb"}
Snapshot saved at snapshotdb

Step 4 (optional): Delete the test pod

k delete pod test

# output: pod "test" deleted

Step 5: Stop kube-apiserver and etcd

The kube-apiserver and the etcd server are static PODs. They are automatically stopped by moving them from the manifests directory:

mv /etc/kubernetes/manifests/kube-apiserver.yaml ./
mv /etc/kubernetes/manifests/etcd.yaml ./

Step 6: Restore etcd

Before we can restore the *etcd database, we need to remove the existing database from the data-dir path:

rm -rf /var/lib/etcd

We now restore the etcd database from the previously created snapshotdb file:

ETCDCTL_API=3 etcdctl snapshot restore --data-dir=/var/lib/etcd snapshotdb 

# output:
Deprecated: Use `etcdutl snapshot restore` instead.

2022-11-23T07:34:00Z    info    snapshot/v3_snapshot.go:251     restoring snapshot      {"path": "savedb", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2022-11-23T07:34:00Z    info    membership/store.go:119 Trimming membership information from the backend...
2022-11-23T07:34:00Z    info    membership/cluster.go:393       added member    {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2022-11-23T07:34:00Z    info    snapshot/v3_snapshot.go:272     restored snapshot       {"path": "savedb", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}

Step 7: Start etcd and kube-apiserver

The etcd and kube-apiserver are automatically started, if the corresponding YAML files are moved back to the manifests directory:

mv etcd.yaml /etc/kubernetes/manifests/ ;
mv kube-apiserver.yaml /etc/kubernetes/manifests/

Step 8 (optional): Check the status of the test POD

When the restore was successful, you should see the test POD up and running again:

k get pod

# output, if you do not wait long enough:
The connection to the server 172.30.1.2:6443 was refused - did you specify the right host or port?

# output after 2 minutes or so:
NAME   READY   STATUS    RESTARTS   AGE
test   1/1     Running   0          4m49s

References

Operating etcd clusters for Kubernetes | Kubernetes