This is a cheat sheet on how to perform backup&restore of the etcd server in kubernetes quickly.

Test this on


Find reference: –> Documentation –> Search „etcd backup restore“

–> you will find: Operating etcd clusters for Kubernetes | Kubernetes

# get params
cat /var/lib/kubelet/config.yaml | grep static
cat /etc/kubernetes/manifests/etcd.yaml | grep -A 30 command | egrep '^ *-'

# create test pod (optional)
k run test --image nginx
k get pod

# backup (get params from output above):
ETCDCTL_API=3 etcdctl --endpoints= --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save snapshotdb

# delete test pod (optional)
k delete pod test
k get pod

# stop kube-apiserver and etcd server:
mv /etc/kubernetes/manifests/kube-apiserver.yaml ./;
mv /etc/kubernetes/manifests/etcd.yaml ./;

# restore:
rm -rf /var/lib/etcd;
ETCDCTL_API=3 etcdctl snapshot restore --data-dir=/var/lib/etcd snapshotdb

# start etcd server and kube-apiserver:
mv etcd.yaml /etc/kubernetes/manifests/;
mv kube-apiserver.yaml /etc/kubernetes/manifests/

# check (optional; wait for 2 minutes or so to allow etcd and kube-apiser to get up)
k get pod

Step 1 (optional): Create a test POD

k run test-pod --image nginx
k get pod

# output (if you wait a minute or so):
test 1/1 Running 0 3m40s

Step 2: Get parameters from the etcd.yaml file

cat /var/lib/kubelet/config.yaml | grep static 

# output:
staticPodPath: /etc/kubernetes/manifests

Use the staticPodPath to find the POD manifests for etcd (and kube-apiserver further down below):

cat /etc/kubernetes/manifests/etcd.yaml | grep -A 30 command | egrep '^ *-'

# output:
 - command:
- etcd
- --advertise-client-urls=
- --cert-file=/etc/kubernetes/pki/etcd/server.crt <----------------- cert-file
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=
- --initial-cluster=controlplane=
- --key-file=/etc/kubernetes/pki/etcd/server.key <-------------------- key-file
- --listen-client-urls=,  <--- endpoints
- --listen-metrics-urls=
- --listen-peer-urls=
- --name=controlplane
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt <-------- trusted-ca-file
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Step 3: Create an etcd Backup

You need to replace the endpoints, cacert, cert and key by the parameters found in step 2:

ETCDCTL_API=3 etcdctl --endpoints=   --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key   snapshot save etcd-backup

# output:
{"level":"info","ts":1669193343.0635,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"snapshotdb.part"}
{"level":"info","ts":1669193343.0680363,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1669193343.068161,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":""}
{"level":"info","ts":1669193343.1282105,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1669193343.1477027,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"","size":"5.7 MB","took":"now"}
Snapshot saved at snapshotdb

Step 4 (optional): Delete the test pod

k delete pod test

# output: pod "test" deleted

Step 5: Stop kube-apiserver and etcd

The kube-apiserver and the etcd server are static PODs. They are automatically stopped by moving them from the manifests directory:

mv /etc/kubernetes/manifests/kube-apiserver.yaml ./
mv /etc/kubernetes/manifests/etcd.yaml ./

Step 6: Restore etcd

Before we can restore the *etcd database, we need to remove the existing database from the data-dir path:

rm -rf /var/lib/etcd

We now restore the etcd database from the previously created snapshotdb file:

ETCDCTL_API=3 etcdctl snapshot restore --data-dir=/var/lib/etcd snapshotdb 

# output:
Deprecated: Use `etcdutl snapshot restore` instead.

2022-11-23T07:34:00Z    info    snapshot/v3_snapshot.go:251     restoring snapshot      {"path": "savedb", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/\*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/\*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/\\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2022-11-23T07:34:00Z    info    membership/store.go:119 Trimming membership information from the backend...
2022-11-23T07:34:00Z    info    membership/cluster.go:393       added member    {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2022-11-23T07:34:00Z    info    snapshot/v3_snapshot.go:272     restored snapshot       {"path": "savedb", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}

Step 7: Start etcd and kube-apiserver

The etcd and kube-apiserver are automatically started, if the corresponding YAML files are moved back to the manifests directory:

mv etcd.yaml /etc/kubernetes/manifests/ ;
mv kube-apiserver.yaml /etc/kubernetes/manifests/

Step 8 (optional): Check the status of the test POD

When the restore was successful, you should see the test POD up and running again:

k get pod

# output, if you do not wait long enough:
The connection to the server was refused - did you specify the right host or port?

# output after 2 minutes or so:
test   1/1     Running   0          4m49s



Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.