This is a cheat sheet on how to perform backup&restore of the etcd server in kubernetes quickly.
Test this on
tl;dr
Find reference: https://kubernetes.io –> Documentation –> Search „etcd backup restore“
–> you will find: Operating etcd clusters for Kubernetes | Kubernetes
# get params cat /var/lib/kubelet/config.yaml | grep static cat /etc/kubernetes/manifests/etcd.yaml | grep -A 30 command | egrep '^ *-' # create test pod (optional) k run test --image nginx k get pod # backup (get params from output above): ETCDCTL_API=3 etcdctl --endpoints=https://172.30.1.2:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save snapshotdb # delete test pod (optional) k delete pod test k get pod # stop kube-apiserver and etcd server: mv /etc/kubernetes/manifests/kube-apiserver.yaml ./; mv /etc/kubernetes/manifests/etcd.yaml ./; # restore: rm -rf /var/lib/etcd; ETCDCTL_API=3 etcdctl snapshot restore --data-dir=/var/lib/etcd snapshotdb # start etcd server and kube-apiserver: mv etcd.yaml /etc/kubernetes/manifests/; mv kube-apiserver.yaml /etc/kubernetes/manifests/ # check (optional; wait for 2 minutes or so to allow etcd and kube-apiser to get up) k get pod
Step 1 (optional): Create a test POD
k run test-pod --image nginx k get pod # output (if you wait a minute or so): NAME READY STATUS RESTARTS AGE test 1/1 Running 0 3m40s
Step 2: Get parameters from the etcd.yaml file
cat /var/lib/kubelet/config.yaml | grep static # output: staticPodPath: /etc/kubernetes/manifests
Use the staticPodPath to find the POD manifests for etcd (and kube-apiserver further down below):
cat /etc/kubernetes/manifests/etcd.yaml | grep -A 30 command | egrep '^ *-' # output: - command: - etcd - --advertise-client-urls=https://172.30.1.2:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.crt <----------------- cert-file - --client-cert-auth=true - --data-dir=/var/lib/etcd - --experimental-initial-corrupt-check=true - --experimental-watch-progress-notify-interval=5s - --initial-advertise-peer-urls=https://172.30.1.2:2380 - --initial-cluster=controlplane=https://172.30.1.2:2380 - --key-file=/etc/kubernetes/pki/etcd/server.key <-------------------- key-file - --listen-client-urls=https://127.0.0.1:2379,https://172.30.1.2:2379 <--- endpoints - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://172.30.1.2:2380 - --name=controlplane - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt <-------- trusted-ca-file - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
Step 3: Create an etcd Backup
You need to replace the endpoints
, cacert
, cert
and key
by the parameters found in step 2:
ETCDCTL_API=3 etcdctl --endpoints=https://172.30.1.2:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save etcd-backup # output: {"level":"info","ts":1669193343.0635,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"snapshotdb.part"} {"level":"info","ts":1669193343.0680363,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"} {"level":"info","ts":1669193343.068161,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"https://172.30.1.2:2379"} {"level":"info","ts":1669193343.1282105,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"} {"level":"info","ts":1669193343.1477027,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"https://172.30.1.2:2379","size":"5.7 MB","took":"now"} {"level":"info","ts":1669193343.148174,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"snapshotdb"} Snapshot saved at snapshotdb
Step 4 (optional): Delete the test pod
k delete pod test # output: pod "test" deleted
Step 5: Stop kube-apiserver and etcd
The kube-apiserver and the etcd server are static PODs. They are automatically stopped by moving them from the manifests directory:
mv /etc/kubernetes/manifests/kube-apiserver.yaml ./ mv /etc/kubernetes/manifests/etcd.yaml ./
Step 6: Restore etcd
Before we can restore the *etcd database, we need to remove the existing database from the data-dir path:
rm -rf /var/lib/etcd
We now restore the etcd database from the previously created snapshotdb file:
ETCDCTL_API=3 etcdctl snapshot restore --data-dir=/var/lib/etcd snapshotdb # output: Deprecated: Use `etcdutl snapshot restore` instead. 2022-11-23T07:34:00Z info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "savedb", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"} 2022-11-23T07:34:00Z info membership/store.go:119 Trimming membership information from the backend... 2022-11-23T07:34:00Z info membership/cluster.go:393 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]} 2022-11-23T07:34:00Z info snapshot/v3_snapshot.go:272 restored snapshot {"path": "savedb", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
Step 7: Start etcd and kube-apiserver
The etcd and kube-apiserver are automatically started, if the corresponding YAML files are moved back to the manifests directory:
mv etcd.yaml /etc/kubernetes/manifests/ ; mv kube-apiserver.yaml /etc/kubernetes/manifests/
Step 8 (optional): Check the status of the test POD
When the restore was successful, you should see the test POD up and running again:
k get pod # output, if you do not wait long enough: The connection to the server 172.30.1.2:6443 was refused - did you specify the right host or port? # output after 2 minutes or so: NAME READY STATUS RESTARTS AGE test 1/1 Running 0 4m49s