kubernetes 学習 calico simple policy tutorial その2
前回の続き。ネットワークポリシー default-deny, access-nginx がある状態から。
$ kubectl get networkpolicy --namespace=policy-demo NAME POD-SELECTOR AGE access-nginx run=nginx 9m54s default-deny <none> 15m
allow-all してみる
矛盾するネットワークポリシーはどうなるの? 適用するネットワークポリシーの順序に関係はあるのか? とか疑問が生まれる。
いろいろ分かっていないが試してみよう。allow-all作成。 これで default-deny, access-nginx, allow-all の3コ。
$ kubectl create -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-all namespace: policy-demo spec: podSelector: matchLabels: {} policyTypes: - Ingress - Egress ingress: - {} egress: - {} EOF
networkpolicy.networking.k8s.io/allow-all created
pod:access から
/ # echo $HOSTNAME access-7c5df8f4c-b8p5b / # wget -q --timeout=5 nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
pod:cant-access から。アクセスできた。
/ # echo $HOSTNAME cant-access-7587658dc7-h5b7f / # wget -q --timeout=5 nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
allow-all 削除
networkpolicy:allow-all 削除。残りは default-deny , access-nginx。
$ kubectl delete networkpolicy allow-all --namespace=policy-demo networkpolicy.extensions "allow-all" deleted
pod:access から
/ # echo $HOSTNAME access-7c5df8f4c-b8p5b / # wget -q --timeout=5 nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
pod:cant-access から。アクセスできなくなった。
/ # echo $HOSTNAME cant-access-7587658dc7-h5b7f / # wget -q --timeout=5 nginx -O - wget: download timed out
default-deny を削除
networkpolicy:default-deny を削除。残りは access-nginx。
$ kubectl delete networkpolicy default-deny --namespace=policy-demo networkpolicy.extensions "default-deny" deleted
pod:access で
/ # echo $HOSTNAME access-7c5df8f4c-b8p5b / # wget -q --timeout=5 nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
pod:cant-access でいけない。あれ?ということは access-nginx は acccess->nginx だけを許可するポリシーだった?
/ # echo $HOSTNAME cant-access-7587658dc7-h5b7f / # wget -q --timeout=5 nginx -O - wget: download timed out
access-nginx 削除
networkpolicy:access-nginx を削除。これでネットワークポリシーがなくなる。
$ kubectl delete networkpolicy access-nginx --namespace=policy-demo networkpolicy.extensions "access-nginx" deleted
pod:access で
/ # echo $HOSTNAME access-7c5df8f4c-b8p5b / # wget -q --timeout=5 nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
pod:cant-access で。いけた。
/ # echo $HOSTNAME cant-access-7587658dc7-h5b7f / # wget -q --timeout=5 nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title>
kubernetes 学習 calico simple policy tutorial
引き続き kubernetes で学習。今回はネットワークポリシー初回
caloco の Simple policy tutorial
Configure namespaces
ネームスペース作成...の前に確認
$ kubectl get ns NAME STATUS AGE default Active 22h kube-node-lease Active 22h kube-public Active 22h kube-system Active 22h
ネームスペース作成
$ kubectl create ns policy-demo namespace/policy-demo created
確認
$ kubectl get ns NAME STATUS AGE default Active 22h kube-node-lease Active 22h kube-public Active 22h kube-system Active 22h policy-demo Active 11s
Create demo Pods
NAMESPACE:policy-demo でいろいろやる前に確認
$ kubectl get all --namespace=policy-demo No resources found.
deployment とか作成。警告されるがスルーで。
$ kubectl run --namespace=policy-demo nginx --replicas=2 --image=nginx kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. deployment.apps/nginx created
確認。pod, deployment, replicaset ができている。
$ kubectl get all --namespace=policy-demo NAME READY STATUS RESTARTS AGE pod/nginx-7db9fccd9b-fgd66 1/1 Running 0 66s pod/nginx-7db9fccd9b-kp2d5 1/1 Running 0 66s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 2/2 2 2 66s NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-7db9fccd9b 2 2 2 66s
pod を wide で確認
$ kubectl get pod --namespace=policy-demo -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-7db9fccd9b-fgd66 1/1 Running 0 5m26s 10.244.2.2 kb3 <none> <none> nginx-7db9fccd9b-kp2d5 1/1 Running 0 5m26s 10.244.1.2 kb2 <none> <none>
deployment にサービス:nginxをつける。これでpodからサービス名でアクセスできるようになる、のは以前調べた。
$ kubectl expose --namespace=policy-demo deployment nginx --port=80 service/nginx exposed
access pod をrunする。プロンプトが表示される。
$ kubectl run --namespace=policy-demo access --rm -ti --image busybox /bin/sh kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. / #
access pod で nginx サービスにアクセス。nginx がレスポンスする。
/ # wget -q nginx -O - <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Enable isolation
policy-demo名前空間で分離を有効にする。Calicoはこの名前空間のポッドへの接続を禁止する。
kind: NetworkPolicy
Network Policies https://kubernetes.io/docs/concepts/services-networking/network-policies/
$ kubectl create -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: default-deny namespace: policy-demo spec: podSelector: matchLabels: {} EOF
networkpolicy.networking.k8s.io/default-deny created
また access pod で wget してみる。タイムアウトする。
/ # wget -q nginx -O - ^C / # wget -q --timeout=5 nginx -O - wget: download timed out
Allow access using a network policy
NetworkPolicyを使ってnginxサービスへのアクセスを有効にする。accessポッドからの着信接続は許可されますが、他の場所からの接続は許可されない。
新しいNetworkPolicy:access-nginxを作成。前のはdefault-deny。
kubectl create -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: access-nginx namespace: policy-demo spec: podSelector: matchLabels: run: nginx ingress: - from: - podSelector: matchLabels: run: access EOF
networkpolicy.networking.k8s.io/access-nginx created
また access pod で wget してみる。アクセスできた。
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> :
pod:cant-access から wgetしてみる。タイムアウトした。
$ kubectl run --namespace=policy-demo cant-access --rm -ti --image busybox /bin/sh : If you don't see a command prompt, try pressing enter. / # wget -q --timeout=5 nginx -O - wget: download timed out
いろいろ確認
ここでチュートリアルはお掃除して終了だが、もう少し見てみる。
$ kubectl get networkpolicy --namespace=policy-demo NAME POD-SELECTOR AGE access-nginx run=nginx 9m54s default-deny <none> 15m
default-deny
$ kubectl describe networkpolicy default-deny --namespace=policy-demo Name: default-deny Namespace: policy-demo Created on: 2019-04-18 19:28:37 +0900 JST Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: <none> (Selected pods are isolated for ingress connectivity) Allowing egress traffic: <none> (Selected pods are isolated for egress connectivity) Policy Types: Ingress
access-nginx
$ kubectl describe networkpolicy access-nginx --namespace=policy-demo Name: access-nginx Namespace: policy-demo Created on: 2019-04-18 19:34:31 +0900 JST Labels: <none> Annotations: <none> Spec: PodSelector: run=nginx Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: PodSelector: run=access Allowing egress traffic: <none> (Selected pods are isolated for egress connectivity) Policy Types: Ingress
nomad チュートリアル webui
https://www.nomadproject.io/intro/getting-started/ui.html
前回の続き。server,client1,client2 を起動し、job:example 実行中。
Opening the Web UI
http://localhost:4646 にアクセス。チュートリアルのように表示されている。example が running。
Inspecting a Job
ブラウザでドリルダウンして構造を確認。
jobs
jobs -> exmple -> taskgroup -> cache -> allocations3コ -> tasks -> redis
clients。クライアントに割り当てられている alloc を確認できる。
clients -> client1 -> allocations
servers
servers -> nomad.global -> tags
チュートリアルおわり
node にタスクグループが割り当てられることは分かった。 アプリケーション開発者として調査しなきゃならないこと満載だなー。
ガイド読むか。ほかのやるか https://www.nomadproject.io/guides/index.html
nomad チュートリアル Clustering
https://www.nomadproject.io/intro/getting-started/cluster.html
Starting the Server
サーバー構成ファイルを作成 server.hcl
# Increase log verbosity log_level = "DEBUG" # Setup data dir data_dir = "/tmp/server1" # Enable the server server { enabled = true # Self-elect, should be 3 or 5 for production bootstrap_expect = 1 }
server.hcl で新エージェントを起動
vagrant@nomad:~$ nomad agent -config server.hcl ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation. ==> Loaded configuration from server.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 10.0.2.15:4646; RPC: 10.0.2.15:4647; Serf: 10.0.2.15:4648 Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648 Client: false Log Level: DEBUG Region: global (DC: dc1) Server: true Version: 0.8.6 ==> Nomad agent started! Log data will stream in below :
Starting the Clients
クライアント構成ファイル client1.hcl, 2.hcl 作成。ディレクトリ /tmp/client1,2 も作成。
# Increase log verbosity log_level = "DEBUG" # Setup data dir data_dir = "/tmp/client1" <--- client2 # Give the agent a unique name. Defaults to hostname name = "client1" <--- client2 # Enable the client client { enabled = true # For demo assume we are talking to server1. For production, # this should be like "nomad.service.consul:4647" and a system # like Consul used for service discovery. servers = ["127.0.0.1:4647"] } # Modify our port to avoid a collision with server1 ports { http = 5656 <--- 5657 }
client1 のエージェント起動
$ mkdir /tmp/client1 $ sudo nomad agent -config client1.hcl ==> Loaded configuration from client1.hcl ==> Starting Nomad agent... ==> Nomad agent configuration: Advertise Addrs: HTTP: 10.0.2.15:5656 Bind Addrs: HTTP: 0.0.0.0:5656 Client: true Log Level: DEBUG Region: global (DC: dc1) Server: false Version: 0.8.6 ==> Nomad agent started! Log data will stream in below: :
client2 のエージェント起動
$ mkdir /tmp/client2 $ sudo nomad agent -config client2.hcl
サーバ確認
$ nomad server members Name Address Port Status Leader Protocol Build Datacenter Region nomad.global 10.0.2.15 4648 alive true 2 0.8.6 dc1 global
ノード確認
$ nomad node status ID DC Name Class Drain Eligibility Status fecc79d6 dc1 client2 <none> false eligible ready 28eb0853 dc1 client1 <none> false eligible ready
Submit a Job
前使った example.nomad を使う。count=3 で image=redis4.0 だった。 チュートリアルは run だが plan してみる。最初の index は 0 だった。
$ nomad job status No running jobs $ nomad job plan example.nomad + Job: "example" + Task Group: "cache" (3 create) + Task: "redis" (forces create) Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 0 To submit the job with version verification run: nomad job run -check-index 0 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
nomad job run。別のnodeにallocされたことがわかる。
$ nomad job run -check-index 0 example.nomad ==> Monitoring evaluation "23b0ee56" Evaluation triggered by job "example" Allocation "fa2fc9df" created: node "28eb0853", group "cache" <--- 別のnodeにalloc Allocation "09d9ae5d" created: node "fecc79d6", group "cache" <--- 別のnodeにalloc Allocation "f77a8705" created: node "fecc79d6", group "cache" Evaluation within deployment: "f0e91e62" Evaluation status changed: "pending" -> "complete" ==> Evaluation "23b0ee56" finished with status "complete"
job:example を確認
$ nomad job status ID Type Priority Status Submit Date example service 50 running 2019-04-10T10:59:50Z $ nomad status example ID = example Name = example Submit Date = 2019-04-10T10:59:50Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 3 0 0 0 Latest Deployment ID = f0e91e62 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 3 3 3 0 2019-04-10T11:10:20Z Allocations ID Node ID Task Group Version Desired Status Created Modified 09d9ae5d fecc79d6 cache 0 run running 1m47s ago 1m17s ago f77a8705 fecc79d6 cache 0 run running 1m47s ago 1m21s ago fa2fc9df 28eb0853 cache 0 run running 1m47s ago 1m25s ago
nomad トライアル jobおさらい
すっかり忘れたのでおさらい。
https://www.nomadproject.io/intro/getting-started/jobs.html
エージェント起動と停止
開発モードでエージェント起動
$ sudo nomad agent -dev
エージェント起動
ctrl-c
エージェント状態確認
$ nomad agent-info
クラスタの状態確認?
このあたりよく分かっていない。
$ nomad node status ID DC Name Class Drain Eligibility Status b62b7daf dc1 nomad <none> false eligible ready
$ nomad server members Name Address Port Status Leader Protocol Build Datacenter Region nomad.global 127.0.0.1 4648 alive true 2 0.8.6 dc1 global
job, taskgroup, task
job-taskgroup-task って構造だった。
- job: taskgroupのまとめ <--- example
- taskgroup: スケジューリング単位 1ノードで実行 <--- cache, alloc
- task: nomad での実行最小単位 <--- redis
job 実行
ジョブ確認。何もいない。
$ nomad job status No running jobs
ジョブファイル生成。上書きしないので削除して作り直し。
$ nomad job init Job 'example.nomad' already exists $ rm example.nomad $ nomad job init Example job file written to example.nomad
ジョブ実行
$ nomad job run example.nomad ==> Monitoring evaluation "8a175ecb" Evaluation triggered by job "example" Allocation "dfee8772" created: node "b62b7daf", group "cache" Evaluation within deployment: "6c7ed692" Evaluation status changed: "pending" -> "complete" ==> Evaluation "8a175ecb" finished with status "complete"
ジョブ一覧
$ nomad job status ID Type Priority Status Submit Date example service 50 running 2019-04-09T11:09:58Z
ジョブ詳細。taskgroup:cache は node:b62b7daf で alloc:dfee8772 になっている。
$ nomad status example vagrant@nomad:~$ nomad status example ID = example Name = example Submit Date = 2019-04-09T11:09:58Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 1 0 0 0 Latest Deployment ID = 6c7ed692 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 1 1 1 0 2019-04-09T11:20:25Z Allocations ID Node ID Task Group Version Desired Status Created Modified dfee8772 b62b7daf cache 0 run running 3m11s ago 2m44s ago
nomad job のサブコマンド
Subcommands: deployments List deployments for a job dispatch Dispatch an instance of a parameterized job eval Force an evaluation for the job history Display all tracked versions of a job init Create an example job file inspect Inspect a submitted job plan Dry-run a job update to determine its effects promote Promote a job's canaries revert Revert to a prior version of the job run Run a new job or update an existing job status Display status information about a job stop Stop a running job validate Checks if a given job specification is valid
alloc
alloc:dfee8772 確認。 example(job).cache(group).redis(task) って構造だった。
$ nomad alloc status dfee8772 ID = dfee8772 Eval ID = 8a175ecb Name = example.cache[0] Node ID = b62b7daf Job ID = example Job Version = 0 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 6m32s ago Modified = 6m5s ago Deployment ID = 6c7ed692 Deployment Health = healthy Task "redis" is "running" Task Resources CPU Memory Disk IOPS Addresses 3/500 MHz 1000 KiB/256 MiB 300 MiB 0 db: 127.0.0.1:28263 Task Events: Started At = 2019-04-09T11:10:07Z Finished At = N/A Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-04-09T11:10:07Z Started Task started by client 2019-04-09T11:09:58Z Driver Downloading image redis:3.2 2019-04-09T11:09:58Z Task Setup Building Task Directory 2019-04-09T11:09:58Z Received Task received by client vagrant@nomad:~$
nomad alloc のサブコマンド。logs は前回やった。
Subcommands: fs Inspect the contents of an allocation directory logs Streams the logs of a task. status Display allocation status information and metadata
fs 見てみる
$ nomad alloc fs dfee8772 Mode Size Modified Time Name drwxrwxrwx 4.0 KiB 2019-04-09T11:09:58Z alloc/ drwxrwxrwx 4.0 KiB 2019-04-09T11:10:06Z redis/
スケール 1->3
example.nomad 書き換え。group:cache の count=1 を 3 にする
$ vi example.nomad ↓ count=1 <--- 3
変更プラン確認。最初気が付かなかったが、ここは plan であって run ではない。
$ nomad job plan example.nomad +/- Job: "example" +/- Task Group: "cache" (2 create, 1 in-place update) +/- Count: "1" => "3" (forces create) Task: "redis" Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 27 To submit the job with version verification run: nomad job run -check-index 27 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
変更プラン適用
$ nomad job run -check-index 27 example.nomad ==> Monitoring evaluation "432f4471" Evaluation triggered by job "example" Allocation "47dff45a" created: node "b62b7daf", group "cache" Allocation "baf43930" created: node "b62b7daf", group "cache" Allocation "dfee8772" modified: node "b62b7daf", group "cache" Evaluation within deployment: "ae570b04" Evaluation status changed: "pending" -> "complete" ==> Evaluation "432f4471" finished with status "complete"
job詳細確認。alloc 2コ増え3コになっている。
$ nomad status example ID = example Name = example Submit Date = 2019-04-09T11:31:50Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 3 0 0 0 Latest Deployment ID = ae570b04 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 3 3 3 0 2019-04-09T11:42:10Z Allocations ID Node ID Task Group Version Desired Status Created Modified 47dff45a b62b7daf cache 1 run running 1m36s ago 1m16s ago baf43930 b62b7daf cache 1 run running 1m36s ago 1m21s ago dfee8772 b62b7daf cache 1 run running 23m28s ago 1m25s ago
■ イメージ変更 redis:3.2 -> redis:4.0
task "redis のイメージを redis:3.2 -> redis:4.0 にする。
$ vi example.nomad ↓ image = "redis:3.2" <--- redis:4.0
プラン確認。プラン確認というかプラン生成か。
$ nomad job plan example.nomad +/- Job: "example" +/- Task Group: "cache" (1 create/destroy update, 2 ignore) +/- Task: "redis" (forces create/destroy update) +/- Config { +/- image: "redis:3.2" => "redis:4.0" port_map[0][db]: "6379" } Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 58 To submit the job with version verification run: nomad job run -check-index 58 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
この時点ではまだallocに影響はない。
$ nomad status example ↓ Allocations ID Node ID Task Group Version Desired Status Created Modified 47dff45a b62b7daf cache 1 run running 8m33s ago 8m13s ago baf43930 b62b7daf cache 1 run running 8m33s ago 8m18s ago dfee8772 b62b7daf cache 1 run running 30m25s ago 8m22s ago
プラン適用。1プランに複数 Evaluation がぶら下がっていて、Evaluation ごとに check-index が進んでいく関係だろうか。 新alloc:33eac15d が生成された。ここで3コでなく1コなのは、task のイメージ変更は 1コ成功したら次のをやる、って感じだからか。
$ nomad job run -check-index 58 example.nomad ==> Monitoring evaluation "e226b835" Evaluation triggered by job "example" Allocation "33eac15d" created: node "b62b7daf", group "cache" Evaluation within deployment: "03079e18" Evaluation status changed: "pending" -> "complete" ==> Evaluation "e226b835" finished with status "complete"
job:example の状態をポーリング。version:2 に切り替わっている様子を確認できた。
$ nomad status example Allocations ID Node ID Task Group Version Desired Status Created Modified 33eac15d b62b7daf cache 2 run running 10s ago 5s ago 47dff45a b62b7daf cache 1 run running 10m27s ago 10m7s ago baf43930 b62b7daf cache 1 run running 10m27s ago 10m12s ago dfee8772 b62b7daf cache 1 stop complete 32m19s ago 5s ago ↓ Allocations ID Node ID Task Group Version Desired Status Created Modified bca76393 b62b7daf cache 2 run pending 3s ago 3s ago 33eac15d b62b7daf cache 2 run running 28s ago 4s ago 47dff45a b62b7daf cache 1 run running 10m45s ago 10m25s ago baf43930 b62b7daf cache 1 stop running 10m45s ago 3s ago dfee8772 b62b7daf cache 1 stop complete 32m37s ago 23s ago ↓ Allocations ID Node ID Task Group Version Desired Status Created Modified f624164c b62b7daf cache 2 run pending 0s ago 0s ago bca76393 b62b7daf cache 2 run running 25s ago 1s ago 33eac15d b62b7daf cache 2 run running 50s ago 26s ago 47dff45a b62b7daf cache 1 stop running 11m7s ago 0s ago baf43930 b62b7daf cache 1 stop complete 11m7s ago 20s ago dfee8772 b62b7daf cache 1 stop complete 32m59s ago 45s ago
job 停止
停止
$ nomad job stop example ==> Monitoring evaluation "2c30cd46" Evaluation triggered by job "example" Evaluation within deployment: "03079e18" Evaluation status changed: "pending" -> "complete" ==> Evaluation "2c30cd46" finished with status "complete"
確認
$ nomad job status ID Type Priority Status Submit Date example service 50 dead (stopped) 2019-04-09T11:42:07Z
job 再開
再開
vagrant@nomad:~$ nomad job run example.nomad ==> Monitoring evaluation "332362e1" Evaluation triggered by job "example" Allocation "1f19159f" created: node "b62b7daf", group "cache" Allocation "381f28af" created: node "b62b7daf", group "cache" Allocation "513949b9" created: node "b62b7daf", group "cache" Evaluation within deployment: "705c27d6" Evaluation status changed: "pending" -> "complete" ==> Evaluation "332362e1" finished with status "complete"
確認
ID Type Priority Status Submit Date example service 50 running 2019-04-09T11:49:04Z
詳細確認。version:3 がないのは、停止も数えているため?
$ nomad status example ↓ Allocations ID Node ID Task Group Version Desired Status Created Modified 1f19159f b62b7daf cache 4 run running 13s ago 1s ago 381f28af b62b7daf cache 4 run running 13s ago 11s ago 513949b9 b62b7daf cache 4 run running 13s ago 11s ago f624164c b62b7daf cache 2 stop complete 6m20s ago 1m40s ago bca76393 b62b7daf cache 2 stop complete 6m45s ago 1m40s ago 33eac15d b62b7daf cache 2 stop complete 7m10s ago 1m39s ago 47dff45a b62b7daf cache 1 stop complete 17m27s ago 1m44s ago baf43930 b62b7daf cache 1 stop complete 17m27s ago 1m44s ago dfee8772 b62b7daf cache 1 stop complete 39m19s ago 1m44s ago
次
nomad チュートリアルはもう少し。kubernetes で調べたいことが終わってない。kafka もやりたい。
nomad トライアル jobs
https://www.nomadproject.io/intro/getting-started/jobs.html
Job と TaskGroup
Jobは、Nomadの作業負荷を宣言するユーザーによって提供される仕様です。 仕事は望ましい状態の一形態です。 ユーザーは、ジョブが実行されるべきであることを表現していますが、実行されるべき場所ではありません。 Nomadの責任は、実際の状態がユーザーの希望する状態と確実に一致するようにすることです。ジョブは1つ以上のタスクグループで構成されています。
タスクグループ
タスクグループは、一緒に実行する必要がある一連のタスクです。 たとえば、Webサーバーでは、ログ配布コプロセスも常に実行されている必要があります。 タスクグループはスケジューリングの単位です。 つまり、グループ全体を同じクライアントノードで実行する必要があり、分割することはできません。
タスク
タスクは、Nomadでの作業の最小単位です。 タスクはドライバによって実行されるため、Nomadはサポートするタスクの種類に柔軟に対応できます。 タスクは、ドライバ、ドライバの設定、制約、および必要なリソースを指定します。
ライバー
ドライバーは、タスクを実行するための基本的な手段を表します。ドライバの例には、Docker、Qemu、Java、および静的バイナリが含まれます。
- job <--- - タスクグループ <--- スケジューリング単位 1ノードで実行 - タスク <--- the smallest unit of work
Running a Job
job initコマンドでスケルトンジョブファイルを生成。example.nomad ファイルができた。
$ nomad job init Example job file written to example.nomad
example.nomad を確認。大量コメントをカットしたらこうなった。 jsonとも異なるこの記法はなんだろう。 job:example -> group:cache -> task:redis といった構造。Dockerドライバで docker run redis:3.2 するようだ。
job "example" { datacenters = ["dc1"] type = "service" update { max_parallel = 1 min_healthy_time = "10s" healthy_deadline = "3m" progress_deadline = "10m" auto_revert = false canary = 0 } migrate { max_parallel = 1 health_check = "checks" min_healthy_time = "10s" healthy_deadline = "5m" } group "cache" { count = 1 restart { attempts = 2 interval = "30m" delay = "15s" mode = "fail" } ephemeral_disk { size = 300 } task "redis" { driver = "docker" config { image = "redis:3.2" port_map { db = 6379 } } resources { cpu = 500 # 500 MHz memory = 256 # 256MB network { mbits = 10 port "db" {} } } service { name = "redis-cache" tags = ["global", "cache"] port = "db" check { name = "alive" type = "tcp" interval = "10s" timeout = "2s" } } } } }
nomad job run で job:example を登録?実行?
vagrant@nomad:~$ nomad job run example.nomad ==> Monitoring evaluation "fc8f4a76" Evaluation triggered by job "example" Allocation "4666e64b" created: node "e84125ed", group "cache" Evaluation within deployment: "45ceab07" Evaluation status changed: "pending" -> "complete" ==> Evaluation "fc8f4a76" finished with status "complete"
nomad status で job:example を確認。時刻はUTC。
$ nomad status example ID = example Name = example Submit Date = 2019-04-05T11:10:18Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 1 0 0 0 Latest Deployment ID = 45ceab07 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 1 1 1 0 2019-04-05T11:20:47Z Allocations ID Node ID Task Group Version Desired Status Created Modified 4666e64b e84125ed cache 0 run running 3m55s ago 3m25s ago
これでローカルノードで実行されたそうだが? node status を確認する。id:e84125ed と Allocations.Node id:e84125ed が一致。
$ nomad node status ID DC Name Class Drain Eligibility Status e84125ed dc1 nomad <none> false eligible ready
alloc status でノードに配置されたタスクグループを確認?
$ nomad alloc status 4666e64b ID = 4666e64b Eval ID = fc8f4a76 Name = example.cache[0] Node ID = e84125ed Job ID = example Job Version = 0 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 11m58s ago Modified = 11m28s ago Deployment ID = 45ceab07 Deployment Health = healthy Task "redis" is "running" Task Resources CPU Memory Disk IOPS Addresses 4/500 MHz 1008 KiB/256 MiB 300 MiB 0 db: 127.0.0.1:26775 Task Events: Started At = 2019-04-05T11:10:35Z Finished At = N/A Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-04-05T11:10:35Z Started Task started by client 2019-04-05T11:10:18Z Driver Downloading image redis:3.2 2019-04-05T11:10:18Z Task Setup Building Task Directory 2019-04-05T11:10:18Z Received Task received by client
alloc logs で alloc:4666e64b task:redis のログ確認
$ nomad alloc logs 4666e64b redis 1:C 05 Apr 11:10:35.349 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 3.2.12 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 6379 | `-._ `._ / _.-' | PID: 1 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 1:M 05 Apr 11:10:35.350 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 05 Apr 11:10:35.350 # Server started, Redis version 3.2.12 1:M 05 Apr 11:10:35.350 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1:M 05 Apr 11:10:35.350 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 1:M 05 Apr 11:10:35.350 * The server is now ready to accept connections on port 6379
Modifying a Job - count=1 -> 3
Job を修正するそうだ。vi example.nomad で count=1 を 3 に変更
group "cache" { count = 1 <--- 3 に変更
nomad に渡してやる。「2 create, 1 in-place update」とある。2コ生成され、1コはそのまま?
$ nomad job plan example.nomad +/- Job: "example" +/- Task Group: "cache" (2 create, 1 in-place update) +/- Count: "1" => "3" (forces create) Task: "redis" Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 24 To submit the job with version verification run: nomad job run -check-index 24 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
また「nomad job run -check-index 24 example.nomad」ともある。チュートリアルのようにやってみる。 alloc:94f5f11c fac747cb が created、4666e64bが modified。
$ nomad job run -check-index 24 example.nomad ==> Monitoring evaluation "481d337c" Evaluation triggered by job "example" Allocation "94f5f11c" created: node "e84125ed", group "cache" Allocation "fac747cb" created: node "e84125ed", group "cache" Allocation "4666e64b" modified: node "e84125ed", group "cache" Evaluation within deployment: "724043af" Evaluation status changed: "pending" -> "complete" ==> Evaluation "481d337c" finished with status "complete"
新しいalloc:94f5f11c を確認 Name:example.cache[1]
$ nomad alloc status 94f5f11c ID = 94f5f11c Eval ID = 481d337c Name = example.cache[1] Node ID = e84125ed Job ID = example Job Version = 1 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 5m22s ago Modified = 5m3s ago Deployment ID = 724043af Deployment Health = healthy :
もう一つの新しいalloc:fac747cb も確認。Name:example.cache[2]
$ nomad alloc status fac747cb ID = fac747cb Eval ID = 481d337c Name = example.cache[2] Node ID = e84125ed Job ID = example Job Version = 1 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 6m40s ago Modified = 6m20s ago Deployment ID = 724043af Deployment Health = healthy :
Modifying a Job - image=redis:3.2 -> redis:4.0
次はイメージを変更。
vi example.nomad
task "redis" { driver = "docker" config { image = "redis:3.2" <--- redis:4.0 に変更
nomad に渡す。チュートリアル読んで違和感感じてたけどやっぱり「1 create/destroy update, 2 ignore」とある。3コupdate じゃないのか?
$ nomad job plan example.nomad +/- Job: "example" +/- Task Group: "cache" (1 create/destroy update, 2 ignore) +/- Task: "redis" (forces create/destroy update) +/- Config { +/- image: "redis:3.2" => "redis:4.0" port_map[0][db]: "6379" } Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 62 To submit the job with version verification run: nomad job run -check-index 62 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
どうなっているのだろう。 -check-index Allocation "97dbfc56" createdとある
$ nomad job run -check-index 62 example.nomad ==> Monitoring evaluation "e52687f2" Evaluation triggered by job "example" Allocation "97dbfc56" created: node "e84125ed", group "cache" Evaluation within deployment: "3188c6a1" Evaluation status changed: "pending" -> "complete" ==> Evaluation "e52687f2" finished with status "complete"
allo:97dbfc56 のステータス見たりログ見たり・
$ nomad alloc status 97dbfc56 ID = 97dbfc56 Eval ID = e52687f2 Name = example.cache[0] Node ID = e84125ed Job ID = example Job Version = 2 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 2m33s ago Modified = 1m59s ago Deployment ID = 3188c6a1 Deployment Health = healthy :
$ nomad alloc logs 97dbfc56 redis 1:C 05 Apr 11:53:20.759 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:C 05 Apr 11:53:20.759 # Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=1, just started 1:C 05 Apr 11:53:20.759 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf 1:M 05 Apr 11:53:20.763 * Running mode=standalone, port=6379. 1:M 05 Apr 11:53:20.763 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 05 Apr 11:53:20.763 # Server initialized 1:M 05 Apr 11:53:20.763 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1:M 05 Apr 11:53:20.763 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 1:M 05 Apr 11:53:20.763 * Ready to accept connections
さいしょのalloc:4666e64b 確認。「Client Status:complete」。終わったと。
$ nomad alloc status 4666e64b ID = 4666e64b Eval ID = 481d337c Name = example.cache[0] Node ID = e84125ed Job ID = example Job Version = 1 Client Status = complete : Recent Events: Time Type Description 2019-04-05T11:52:59Z Killed Task successfully killed 2019-04-05T11:52:58Z Killing Sent interrupt. Waiting 5s before force killing 2019-04-05T11:10:35Z Started Task started by client 2019-04-05T11:10:18Z Driver Downloading image redis:3.2 2019-04-05T11:10:18Z Task Setup Building Task Directory 2019-04-05T11:10:18Z Received Task received by client $ nomad alloc logs 4666e64b redis : 1:M 05 Apr 11:10:35.350 * The server is now ready to accept connections on port 6379 1:signal-handler (1554465179) Received SIGTERM scheduling shutdown... 1:M 05 Apr 11:52:59.138 # User requested shutdown... 1:M 05 Apr 11:52:59.138 * Saving the final RDB snapshot before exiting. 1:M 05 Apr 11:52:59.142 * DB saved on disk 1:M 05 Apr 11:52:59.142 # Redis is now ready to exit, bye bye...
2コはどうなった。complete している。
$ nomad alloc status fac747cb ID = fac747cb : Client Status = complete $ nomad alloc status 4666e64b ID = 4666e64b : Client Status = complete
やったことを再確認
- nomad job run example.nomad
- count1 -> count3
- nomad job plan example.nomad
- nomad job run -check-index 24 example.nomad
- redis3 -> redis4
- nomad job run -check-index 62 example.nomad
■ Stopping a Job
まだ腑に落ちないが、、、job 停止
$ nomad job stop example ==> Monitoring evaluation "8fd141b1" Evaluation triggered by job "example" Evaluation within deployment: "3188c6a1" Evaluation status changed: "pending" -> "complete" ==> Evaluation "8fd141b1" finished with status "complete"
job:example確認。おや知らないaloc72113ea3, 062ce877 がある。
$ nomad status example ID = example Name = example Submit Date = 2019-04-05T11:52:58Z Type = service Priority = 50 Datacenters = dc1 Status = dead (stopped) Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 0 0 6 0 Latest Deployment ID = 3188c6a1 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 3 3 3 0 2019-04-05T12:04:13Z Allocations ID Node ID Task Group Version Desired Status Created Modified 72113ea3 e84125ed cache 2 stop complete 26m33s ago 34s ago 062ce877 e84125ed cache 2 stop complete 26m53s ago 34s ago 97dbfc56 e84125ed cache 2 stop complete 27m28s ago 34s ago 94f5f11c e84125ed cache 1 stop complete 41m56s ago 39s ago fac747cb e84125ed cache 1 stop complete 41m56s ago 39s ago 4666e64b e84125ed cache 1 stop complete 1h10m ago 39s ago
ジョブの再開は nomad job run
$ nomad job run example.nomad ==> Monitoring evaluation "0459a7f5" Evaluation triggered by job "example" Allocation "805f4360" created: node "e84125ed", group "cache" Allocation "b49722f6" created: node "e84125ed", group "cache" Allocation "f4f093db" created: node "e84125ed", group "cache" Evaluation within deployment: "0debd07e" Evaluation status changed: "pending" -> "complete" ==> Evaluation "0459a7f5" finished with status "complete"
まだ腑に落ちないところがあるが、構造が分かってきた。job->group(allloc)->task(redis)
nomad トライアル running
https://www.nomadproject.io/intro/getting-started/running.html
エージェントとは
クライアント - Nomadのクライアントは、タスクを実行できるマシンです。 すべてのクライアントがNomadエージェントを実行しています。 エージェントは、サーバーへの登録、割り当てられる作業の監視、およびタスクの実行を担当します。 Nomadエージェントは、サーバーとやり取りする長期にわたるプロセスです。 https://www.nomadproject.io/docs/internals/architecture.html
エージェントの起動
久しぶりにvagrant使う。パスワード思い出すのに時間がかかった。vagrant。 チュートリアルのまま dev モード? でエージェント起動。
$ vagrant ssh : : $ sudo nomad agent -dev : :
クラスタノード確認
$ vagrant ssh : $ nomad node status ID DC Name Class Drain Eligibility Status ed6136c5 dc1 nomad <none> false eligible ready
$ nomad server members Name Address Port Status Leader Protocol Build Datacenter Region nomad.global 127.0.0.1 4648 alive true 2 0.8.6 dc1 global
エージェントを停止する
ctrl-c でエージェントを停止する。こんなログを得た。これはエージェントがクラスタから退席しているのを表現しているのだとか。
^C==> Caught signal: interrupt [DEBUG] http: Shutting down http server [INFO] agent: requesting shutdown [INFO] client: shutting down [INFO] nomad: shutting down server serf: Shutdown without a Leave [ERR] nomad: "Node.GetClientAllocs" RPC failed to server 127.0.0.1:4647: rpc error: EOF [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks [INFO] agent: shutdown complete
クラスタノード確認 2
リクエストするも失敗している。 ポート:4646 と 2つのエンドポイントを確認した。
$ nomad node status Error querying node status: Get http://127.0.0.1:4646/v1/nodes: dial tcp 127.0.0.1:4646: connect: connection refused $ nomad server members Error querying servers: Get http://127.0.0.1:4646/v1/agent/members: dial tcp 127.0.0.1:4646: connect: connection refusedv
エージェント起動してcurl
jq いれた
sudo apt -y install jq
curl してみた
$ curl http://127.0.0.1:4646/v1/nodes | jq [ { "Address": "127.0.0.1", "CreateIndex": 6, "Datacenter": "dc1", "Drain": false, "Drivers": { : }, "ID": "cfebff0f-e21e-fabb-c777-17f9d9ae8439", "ModifyIndex": 9, "Name": "nomad", "NodeClass": "", "SchedulingEligibility": "eligible", "Status": "ready", "StatusDescription": "", "Version": "0.8.6" } ]
curl http://127.0.0.1:4646/v1/agent/members | jq { "Members": [ { "Addr": "127.0.0.1", "DelegateCur": 4, "DelegateMax": 5, "DelegateMin": 2, "Name": "nomad.global", "Port": 4648, "ProtocolCur": 2, "ProtocolMax": 5, "ProtocolMin": 1, "Status": "alive", "Tags": { "mvn": "1", "bootstrap": "1", "region": "global", "id": "eef87cd3-56e2-018d-b52a-6efc1952ce05", "rpc_addr": "127.0.0.1", "role": "nomad", "raft_vsn": "2", "port": "4647", "dc": "dc1", "vsn": "1", "build": "0.8.6" } } ], "ServerDC": "dc1", "ServerName": "nomad", "ServerRegion": "global" }