Kubernetes で PodDisruptionBudget (PDB) を使うと，kubectl drain コマンドでノードをドレインするときなどに Kubernetes クラスター内に起動しておく「最低限必要な Pod 数」を設定できる．よって，Kubernetes を使ったサービスの可用性を高める戦略としてよく使われる．

以下のドキュメントを読むと「自発的な混乱 (Voluntary Disruptions)」と「非自発的な混乱 (Involuntary Disruptions)」という表現があり，既に紹介した「ノードをドレインするとき」は，管理者が運用として「自発的に」行うため「自発的な混乱 (Voluntary Disruptions)」に該当する．例えば「カーネルパニック (a kernel panic)」は突発的に発生するため「非自発的な混乱 (Involuntary Disruptions)」に該当する．

前提 : `kubectl drain` コマンド

今回使う kubectl drain コマンドの基本的な流れは以下の記事にまとめてある．前提として合わせて読んでもらえると良いかと！

kakakakakku.hatenablog.com

検証内容 : `kubectl drain` コマンド + PDB + リソース不足

今回は「kubectl drain コマンドでノードをドレインしたときに PodDisruptionBudget (PDB) に設定した最低限必要な Pod 数を満たすリソースがノードに不足していた場合」という少し複雑な状況での挙動を検証する．検証する前の予想としては「デッドロック的な挙動になる？」もしくは「タイムアウト的な挙動になる？」と考えていた．

検証環境

今回は kind を使って，Mac 上に検証用の Kubernetes クラスター（複数ノード構成）を構築する．バージョンは v1.19.1 を使う．各ノードは「1 GiB メモリ」を使えるようにしてある．

$ kubectl get nodes
NAME                 STATUS   ROLES    AGE     VERSION
kind-control-plane   Ready    master   4m      v1.19.1
kind-worker          Ready    <none>   3m30s   v1.19.1
kind-worker2         Ready    <none>   3m30s   v1.19.1

$ kubectl describe nodes kind-worker | grep -A 6 Allocatable:
Allocatable:
  cpu:                2
  ephemeral-storage:  41021664Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             989764Ki
  pods:               110
  
$ kubectl describe nodes kind-worker2 | grep -A 6 Allocatable:
Allocatable:
  cpu:                2
  ephemeral-storage:  41021664Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             989764Ki
  pods:               110

そして「メモリ 300 MiB (requests.memory: 300Mi)」を要求する Pod「6個」を Deployment 経由で作る．以下の kubectl get pods コマンドの結果からもわかる通り，各ノードに「3 Pod ずつ」デプロイされている．

$ kubectl apply -f deployment.yaml
deployment.apps/sandbox-drain-timeout-nginx created

$ kubectl get pods -o custom-columns=Name:metadata.name,STATUS:status.phase,NODE:spec.nodeName | sort -k 3
Name                                          STATUS    NODE
sandbox-drain-timeout-nginx-f57cbc6cb-94z9s   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-gcvh8   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-kxw2l   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-8cxxt   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-vksh8   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-z5f6n   Running   kind-worker2

構成をザッと図解すると以下のようになる．

f:id:kakku22:20210224003614p:plain

検証 1 : PDB なしで `kubectl drain` コマンドを実行する

まずは PodDisruptionBudget (PDB) なしで kubectl drain コマンドを実行して kind-worker ノードをドレインする．すると kind-worker ノードで起動していた Pod「3個」が削除 (eviction) される．そして kind-worker ノードのステータスは SchedulingDisabled になり，新しく Pod をスケジューリングしないようになる．今回 kind-worker2 ノードにリソース（メモリ）が不足しているため，スケジューリングされている Pod は全て Pending になっている．

$ kubectl drain kind-worker --ignore-daemonsets
node/kind-worker cordoned
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-94z9s
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-kxw2l
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-gcvh8
pod/sandbox-drain-timeout-nginx-f57cbc6cb-gcvh8 evicted
pod/sandbox-drain-timeout-nginx-f57cbc6cb-kxw2l evicted
pod/sandbox-drain-timeout-nginx-f57cbc6cb-94z9s evicted
node/kind-worker evicted

$ kubectl get nodes
NAME                 STATUS                     ROLES    AGE   VERSION
kind-control-plane   Ready                      master   12m   v1.19.1
kind-worker          Ready,SchedulingDisabled   <none>   11m   v1.19.1
kind-worker2         Ready                      <none>   11m   v1.19.1

$ kubectl get pods -o custom-columns=Name:metadata.name,STATUS:status.phase,NODE:spec.nodeName | sort -k 3
Name                                          STATUS    NODE
sandbox-drain-timeout-nginx-f57cbc6cb-f6qth   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-nvlkf   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-nzpbt   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-8cxxt   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-vksh8   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-z5f6n   Running   kind-worker2

構成をザッと図解すると以下のようになる．

f:id:kakku22:20210224003647p:plain

次の検証をするためにノードと Deployment を同じ構成に戻しておく．

$ kubectl uncordon kind-worker

$ kubectl delete -f deployment.yaml

$ kubectl apply -f deployment.yaml

$ kubectl get pods -o custom-columns=Name:metadata.name,STATUS:status.phase,NODE:spec.nodeName | sort -k 3
Name                                          STATUS    NODE
sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-hxz7h   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-xdvc9   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-2kq98   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-m9cbb   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-vcwjv   Running   kind-worker2

検証 2 : PDB ありで `kubectl drain` コマンドを実行する

次は PodDisruptionBudget (PDB) ありで kubectl drain コマンドを実行して kind-worker ノードをドレインする．まず，PodDisruptionBudget (PDB) のマニフェストを作る．今回は Deployment 経由で作られる Pod「6個」に対して「最低限必要な Pod 数 (minAvailable)」を「4」に設定している．他には maxUnavailable という設定値も使える．

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: sandbox-drain-timeout-pdb
spec:
  minAvailable: 4
  selector:
    matchLabels:
      app: nginx

マニフェストを適用しておく．

$ kubectl apply -f pdb.yaml
poddisruptionbudget.policy/sandbox-drain-timeout-pdb created

$ kubectl get poddisruptionbudgets
NAME                        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
sandbox-drain-timeout-pdb   4               N/A               2                     40s

さっそく kubectl drain コマンドを実行すると，「検証 1」と同じように kind-worker ノードで起動していた Pod「3個」が削除 (eviction) される．しかし PodDisruptionBudget (PDB) に minAvailable: 4 と設定しているため，Pod（今回は sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp）を削除 (eviction) することができず，自動的にリトライをしている．以下の結果では Cannot evict pod というメッセージが繰り返し表示されていて，ターミナルも待機状態のままになっている．

$ kubectl drain kind-worker --ignore-daemonsets
node/kind-worker cordoned
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-hxz7h
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-xdvc9
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/sandbox-drain-timeout-nginx-f57cbc6cb-hxz7h evicted
pod/sandbox-drain-timeout-nginx-f57cbc6cb-xdvc9 evicted
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

（中略）

evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

（続く...）

$ kubectl get pods -o custom-columns=Name:metadata.name,STATUS:status.phase,NODE:spec.nodeName | sort -k 3
Name                                          STATUS    NODE
sandbox-drain-timeout-nginx-f57cbc6cb-fr9cq   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-pntw8   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-9d8qp   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-2kq98   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-m9cbb   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-vcwjv   Running   kind-worker2

待っても待ってもリトライが続くため，以下のドキュメントを確認したところ，今回設定しなかった kubectl drain コマンドの --timeout オプションのデフォルト値は 0s となり，無限にリトライを続ける挙動になっていた．なるほど！

The length of time to wait before giving up, zero means infinite

Kubectl Reference Docs

構成をザッと図解すると以下のようになる．

f:id:kakku22:20210224003710p:plain

次の検証をするためにノードと Deployment を同じ構成に戻しておく．

$ kubectl uncordon kind-worker

$ kubectl delete -f deployment.yaml

$ kubectl apply -f deployment.yaml

$ kubectl get pods -o custom-columns=Name:metadata.name,STATUS:status.phase,NODE:spec.nodeName | sort -k 3
Name                                          STATUS    NODE
sandbox-drain-timeout-nginx-f57cbc6cb-s4btz   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-ssscp   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-cdkcp   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-g6v7p   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-hvgm2   Running   kind-worker2

検証 3 : PDB ありで `kubectl drain --timeout` コマンドを実行する

kubectl drain コマンドはデフォルトでリトライを続ける挙動になっていることを確認できたため，最後は --timeout オプションを設定して挙動を確認する．今回はサンプルとして「30秒」にする．すると最後に global timeout reached: 30s というメッセージが表示されて kubectl drain コマンドがエラーを返している．結果としては「検証 2」と同じになる．なるほど！

$ kubectl drain kind-worker --ignore-daemonsets --timeout 30s
node/kind-worker cordoned
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-s4btz
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-ssscp
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/sandbox-drain-timeout-nginx-f57cbc6cb-s4btz evicted
pod/sandbox-drain-timeout-nginx-f57cbc6cb-ssscp evicted
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
There are pending pods in node "kind-worker" when an error occurred: error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh": global timeout reached: 30s
pod/sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh
error: unable to drain node "kind-worker", aborting command...

There are pending nodes to be drained:
 kind-worker
error: error when evicting pod "sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh": global timeout reached: 30s

$ kubectl get pods -o custom-columns=Name:metadata.name,STATUS:status.phase,NODE:spec.nodeName | sort -k 3
Name                                          STATUS    NODE
sandbox-drain-timeout-nginx-f57cbc6cb-5xs7k   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-jlckx   Pending   <none>
sandbox-drain-timeout-nginx-f57cbc6cb-tx6xh   Running   kind-worker
sandbox-drain-timeout-nginx-f57cbc6cb-cdkcp   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-g6v7p   Running   kind-worker2
sandbox-drain-timeout-nginx-f57cbc6cb-hvgm2   Running   kind-worker2

まとめ

kubectl drain コマンドを試しているときにふと気になり，今回は「kubectl drain コマンドでノードをドレインしたときに PodDisruptionBudget (PDB) に設定した最低限必要な Pod 数を満たすリソースがノードに不足していた場合」という少し複雑な状況での挙動を検証した．ドキュメントに書いてあるとは言え，kubectl drain コマンドのデフォルトはリトライを続けて，--timeout オプションを設定するとタイムアウト時間を決めることができた．今後も積極的に検証していくぞー！