tjtjtjのメモ

自分のためのメモです

kubernetes 学習 calico simple policy tutorial その2

前回の続き。ネットワークポリシー default-deny, access-nginx がある状態から。

$ kubectl get networkpolicy --namespace=policy-demo
NAME           POD-SELECTOR   AGE
access-nginx   run=nginx      9m54s
default-deny   <none>         15m

allow-all してみる

矛盾するネットワークポリシーはどうなるの? 適用するネットワークポリシーの順序に関係はあるのか? とか疑問が生まれる。

いろいろ分かっていないが試してみよう。allow-all作成。 これで default-deny, access-nginx, allow-all の3コ。

$ kubectl create -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-all
  namespace: policy-demo
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - {}
  egress:
  - {}
EOF
networkpolicy.networking.k8s.io/allow-all created

pod:access から

/ # echo $HOSTNAME
access-7c5df8f4c-b8p5b
/ # wget -q --timeout=5 nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

pod:cant-access から。アクセスできた。

/ # echo $HOSTNAME
cant-access-7587658dc7-h5b7f
/ # wget -q --timeout=5 nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

allow-all 削除

networkpolicy:allow-all 削除。残りは default-deny , access-nginx。

$ kubectl delete networkpolicy allow-all --namespace=policy-demo
networkpolicy.extensions "allow-all" deleted

pod:access から

/ # echo $HOSTNAME
access-7c5df8f4c-b8p5b
/ # wget -q --timeout=5 nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

pod:cant-access から。アクセスできなくなった。

/ # echo $HOSTNAME
cant-access-7587658dc7-h5b7f
/ # wget -q --timeout=5 nginx -O -
wget: download timed out

default-deny を削除

networkpolicy:default-deny を削除。残りは access-nginx。

$ kubectl delete networkpolicy default-deny --namespace=policy-demo
networkpolicy.extensions "default-deny" deleted

pod:access

/ # echo $HOSTNAME
access-7c5df8f4c-b8p5b
/ # wget -q --timeout=5 nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

pod:cant-access でいけない。あれ?ということは access-nginx は acccess->nginx だけを許可するポリシーだった?

/ # echo $HOSTNAME
cant-access-7587658dc7-h5b7f
/ # wget -q --timeout=5 nginx -O -
wget: download timed out

access-nginx 削除

networkpolicy:access-nginx を削除。これでネットワークポリシーがなくなる。

$ kubectl delete networkpolicy access-nginx --namespace=policy-demo
networkpolicy.extensions "access-nginx" deleted

pod:access

/ # echo $HOSTNAME
access-7c5df8f4c-b8p5b
/ # wget -q --timeout=5 nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

pod:cant-access で。いけた。

/ # echo $HOSTNAME
cant-access-7587658dc7-h5b7f
/ # wget -q --timeout=5 nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

kubernetes 学習 calico simple policy tutorial

引き続き kubernetes で学習。今回はネットワークポリシー初回

caloco の Simple policy tutorial

Configure namespaces

ネームスペース作成...の前に確認

$ kubectl get ns 
NAME              STATUS   AGE
default           Active   22h
kube-node-lease   Active   22h
kube-public       Active   22h
kube-system       Active   22h

ネームスペース作成

$ kubectl create ns policy-demo
namespace/policy-demo created

確認

$ kubectl get ns
NAME              STATUS   AGE
default           Active   22h
kube-node-lease   Active   22h
kube-public       Active   22h
kube-system       Active   22h
policy-demo       Active   11s

Create demo Pods

NAMESPACE:policy-demo でいろいろやる前に確認

$ kubectl get all --namespace=policy-demo
No resources found.

deployment とか作成。警告されるがスルーで。

$ kubectl run --namespace=policy-demo nginx --replicas=2 --image=nginx
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/nginx created

確認。pod, deployment, replicaset ができている。

$ kubectl get all --namespace=policy-demo
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-7db9fccd9b-fgd66   1/1     Running   0          66s
pod/nginx-7db9fccd9b-kp2d5   1/1     Running   0          66s

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   2/2     2            2           66s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-7db9fccd9b   2         2         2       66s

pod を wide で確認

$ kubectl get pod --namespace=policy-demo -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE   NOMINATED NODE   READINESS GATES
nginx-7db9fccd9b-fgd66   1/1     Running   0          5m26s   10.244.2.2   kb3    <none>           <none>
nginx-7db9fccd9b-kp2d5   1/1     Running   0          5m26s   10.244.1.2   kb2    <none>           <none>

deployment にサービス:nginxをつける。これでpodからサービス名でアクセスできるようになる、のは以前調べた。

$ kubectl expose --namespace=policy-demo deployment nginx --port=80
service/nginx exposed

access pod をrunする。プロンプトが表示される。

$ kubectl run --namespace=policy-demo access --rm -ti --image busybox /bin/sh
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
/ #

access pod で nginx サービスにアクセス。nginx がレスポンスする。

/ # wget -q nginx -O -
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Enable isolation

policy-demo名前空間で分離を有効にする。Calicoはこの名前空間のポッドへの接続を禁止する。

kind: NetworkPolicy

Network Policies https://kubernetes.io/docs/concepts/services-networking/network-policies/

$ kubectl create -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
  namespace: policy-demo
spec:
  podSelector:
    matchLabels: {}
EOF
networkpolicy.networking.k8s.io/default-deny created

また access pod で wget してみる。タイムアウトする。

/ # wget -q nginx -O -
^C
/ # wget -q --timeout=5 nginx -O -
wget: download timed out

Allow access using a network policy

NetworkPolicyを使ってnginxサービスへのアクセスを有効にする。accessポッドからの着信接続は許可されますが、他の場所からの接続は許可されない。

新しいNetworkPolicy:access-nginxを作成。前のはdefault-deny。

kubectl create -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: access-nginx
  namespace: policy-demo
spec:
  podSelector:
    matchLabels:
      run: nginx
  ingress:
    - from:
      - podSelector:
          matchLabels:
            run: access
EOF
networkpolicy.networking.k8s.io/access-nginx created

また access pod で wget してみる。アクセスできた。

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
:

pod:cant-access から wgetしてみる。タイムアウトした。

$ kubectl run --namespace=policy-demo cant-access --rm -ti --image busybox /bin/sh
:
If you don't see a command prompt, try pressing enter.
/ # wget -q --timeout=5 nginx -O -
wget: download timed out

いろいろ確認

ここでチュートリアルはお掃除して終了だが、もう少し見てみる。

$ kubectl get networkpolicy --namespace=policy-demo
NAME           POD-SELECTOR   AGE
access-nginx   run=nginx      9m54s
default-deny   <none>         15m

default-deny

$ kubectl describe networkpolicy default-deny --namespace=policy-demo
Name:         default-deny
Namespace:    policy-demo
Created on:   2019-04-18 19:28:37 +0900 JST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    <none> (Selected pods are isolated for ingress connectivity)
  Allowing egress traffic:
    <none> (Selected pods are isolated for egress connectivity)
  Policy Types: Ingress

access-nginx

$ kubectl describe networkpolicy access-nginx --namespace=policy-demo
Name:         access-nginx
Namespace:    policy-demo
Created on:   2019-04-18 19:34:31 +0900 JST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     run=nginx
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      PodSelector: run=access
  Allowing egress traffic:
    <none> (Selected pods are isolated for egress connectivity)
  Policy Types: Ingress

nomad チュートリアル webui

https://www.nomadproject.io/intro/getting-started/ui.html

前回の続き。server,client1,client2 を起動し、job:example 実行中。

Opening the Web UI

http://localhost:4646 にアクセス。チュートリアルのように表示されている。example が running。

Inspecting a Job

ブラウザでドリルダウンして構造を確認。

jobs

jobs -> exmple -> taskgroup -> cache -> allocations3コ -> tasks -> redis 

clients。クライアントに割り当てられている alloc を確認できる。

clients -> client1 -> allocations

servers

servers -> nomad.global -> tags

チュートリアルおわり

node にタスクグループが割り当てられることは分かった。 アプリケーション開発者として調査しなきゃならないこと満載だなー。

ガイド読むか。ほかのやるか https://www.nomadproject.io/guides/index.html

nomad チュートリアル Clustering

https://www.nomadproject.io/intro/getting-started/cluster.html

Starting the Server

サーバー構成ファイルを作成 server.hcl

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/server1"

# Enable the server
server {
    enabled = true

    # Self-elect, should be 3 or 5 for production
    bootstrap_expect = 1
}

server.hcl で新エージェントを起動

vagrant@nomad:~$ nomad agent -config server.hcl
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from server.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 10.0.2.15:4646; RPC: 10.0.2.15:4647; Serf: 10.0.2.15:4648
            Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
                Client: false
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: true
               Version: 0.8.6

==> Nomad agent started! Log data will stream in below
:

Starting the Clients

クライアント構成ファイル client1.hcl, 2.hcl 作成。ディレクトリ /tmp/client1,2 も作成。

# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/client1"          <--- client2

# Give the agent a unique name. Defaults to hostname
name = "client1"                   <--- client2

# Enable the client
client {
    enabled = true

    # For demo assume we are talking to server1. For production,
    # this should be like "nomad.service.consul:4647" and a system
    # like Consul used for service discovery.
    servers = ["127.0.0.1:4647"]
}

# Modify our port to avoid a collision with server1
ports {
    http = 5656                    <--- 5657
}

client1 のエージェント起動

$ mkdir /tmp/client1
$ sudo nomad agent -config client1.hcl
==> Loaded configuration from client1.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 10.0.2.15:5656
            Bind Addrs: HTTP: 0.0.0.0:5656
                Client: true
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: false
               Version: 0.8.6

==> Nomad agent started! Log data will stream in below:
:

client2 のエージェント起動

$ mkdir /tmp/client2
$ sudo nomad agent -config client2.hcl

サーバ確認

$ nomad server members
Name          Address    Port  Status  Leader  Protocol  Build  Datacenter  Region
nomad.global  10.0.2.15  4648  alive   true    2         0.8.6  dc1         global

ノード確認

$ nomad node status
ID        DC   Name     Class   Drain  Eligibility  Status
fecc79d6  dc1  client2  <none>  false  eligible     ready
28eb0853  dc1  client1  <none>  false  eligible     ready

Submit a Job

前使った example.nomad を使う。count=3 で image=redis4.0 だった。 チュートリアルは run だが plan してみる。最初の index は 0 だった。

$ nomad job status
No running jobs

$ nomad job plan example.nomad
+ Job: "example"
+ Task Group: "cache" (3 create)
  + Task: "redis" (forces create)

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 example.nomad

When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

nomad job run。別のnodeにallocされたことがわかる。

$ nomad job run -check-index 0 example.nomad
==> Monitoring evaluation "23b0ee56"
    Evaluation triggered by job "example"
    Allocation "fa2fc9df" created: node "28eb0853", group "cache"      <--- 別のnodeにalloc
    Allocation "09d9ae5d" created: node "fecc79d6", group "cache"      <--- 別のnodeにalloc
    Allocation "f77a8705" created: node "fecc79d6", group "cache"
    Evaluation within deployment: "f0e91e62"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "23b0ee56" finished with status "complete"

job:example を確認

$ nomad job status
ID       Type     Priority  Status   Submit Date
example  service  50        running  2019-04-10T10:59:50Z

$ nomad status example
ID            = example
Name          = example
Submit Date   = 2019-04-10T10:59:50Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         3        0       0         0

Latest Deployment
ID          = f0e91e62
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       3        3       3        0          2019-04-10T11:10:20Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
09d9ae5d  fecc79d6  cache       0        run      running  1m47s ago  1m17s ago
f77a8705  fecc79d6  cache       0        run      running  1m47s ago  1m21s ago
fa2fc9df  28eb0853  cache       0        run      running  1m47s ago  1m25s ago

nomad トライアル jobおさらい

すっかり忘れたのでおさらい。

https://www.nomadproject.io/intro/getting-started/jobs.html

エージェント起動と停止

開発モードでエージェント起動

$ sudo nomad agent -dev

エージェント起動

ctrl-c

エージェント状態確認

$ nomad agent-info

クラスタの状態確認?

このあたりよく分かっていない。

$ nomad node status
ID        DC   Name   Class   Drain  Eligibility  Status
b62b7daf  dc1  nomad  <none>  false  eligible     ready
$ nomad server members
Name          Address    Port  Status  Leader  Protocol  Build  Datacenter  Region
nomad.global  127.0.0.1  4648  alive   true    2         0.8.6  dc1         global

job, taskgroup, task

job-taskgroup-task って構造だった。

  • job: taskgroupのまとめ <--- example
  • taskgroup: スケジューリング単位 1ノードで実行 <--- cache, alloc
  • task: nomad での実行最小単位 <--- redis

job 実行

ジョブ確認。何もいない。

$ nomad job status
No running jobs

ジョブファイル生成。上書きしないので削除して作り直し。

$ nomad job init
Job 'example.nomad' already exists
$ rm example.nomad
$ nomad job init
Example job file written to example.nomad

ジョブ実行

$ nomad job run example.nomad
==> Monitoring evaluation "8a175ecb"
    Evaluation triggered by job "example"
    Allocation "dfee8772" created: node "b62b7daf", group "cache"
    Evaluation within deployment: "6c7ed692"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "8a175ecb" finished with status "complete"

ジョブ一覧

$ nomad job status
ID       Type     Priority  Status   Submit Date
example  service  50        running  2019-04-09T11:09:58Z

ジョブ詳細。taskgroup:cache は node:b62b7daf で alloc:dfee8772 になっている。

$ nomad status example
vagrant@nomad:~$ nomad status example
ID            = example
Name          = example
Submit Date   = 2019-04-09T11:09:58Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         1        0       0         0

Latest Deployment
ID          = 6c7ed692
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       1        1       1        0          2019-04-09T11:20:25Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
dfee8772  b62b7daf  cache       0        run      running  3m11s ago  2m44s ago

nomad job のサブコマンド

Subcommands:
    deployments    List deployments for a job
    dispatch       Dispatch an instance of a parameterized job
    eval           Force an evaluation for the job
    history        Display all tracked versions of a job
    init           Create an example job file
    inspect        Inspect a submitted job
    plan           Dry-run a job update to determine its effects
    promote        Promote a job's canaries
    revert         Revert to a prior version of the job
    run            Run a new job or update an existing job
    status         Display status information about a job
    stop           Stop a running job
    validate       Checks if a given job specification is valid

alloc

alloc:dfee8772 確認。 example(job).cache(group).redis(task) って構造だった。

$ nomad alloc status dfee8772
ID                  = dfee8772
Eval ID             = 8a175ecb
Name                = example.cache[0]
Node ID             = b62b7daf
Job ID              = example
Job Version         = 0
Client Status       = running
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 6m32s ago
Modified            = 6m5s ago
Deployment ID       = 6c7ed692
Deployment Health   = healthy


Task "redis" is "running"
Task Resources
CPU        Memory            Disk     IOPS  Addresses
3/500 MHz  1000 KiB/256 MiB  300 MiB  0     db: 127.0.0.1:28263

Task Events:
Started At     = 2019-04-09T11:10:07Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-04-09T11:10:07Z  Started     Task started by client
2019-04-09T11:09:58Z  Driver      Downloading image redis:3.2
2019-04-09T11:09:58Z  Task Setup  Building Task Directory
2019-04-09T11:09:58Z  Received    Task received by client
vagrant@nomad:~$

nomad alloc のサブコマンド。logs は前回やった。

Subcommands:
    fs        Inspect the contents of an allocation directory
    logs      Streams the logs of a task.
    status    Display allocation status information and metadata

fs 見てみる

$ nomad alloc fs dfee8772
Mode        Size     Modified Time         Name
drwxrwxrwx  4.0 KiB  2019-04-09T11:09:58Z  alloc/
drwxrwxrwx  4.0 KiB  2019-04-09T11:10:06Z  redis/

スケール 1->3

example.nomad 書き換え。group:cache の count=1 を 3 にする

$ vi example.nomad
↓
count=1 <--- 3

変更プラン確認。最初気が付かなかったが、ここは plan であって run ではない。

$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 create, 1 in-place update)
  +/- Count: "1" => "3" (forces create)
      Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 27
To submit the job with version verification run:

nomad job run -check-index 27 example.nomad

When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

変更プラン適用

$ nomad job run -check-index 27 example.nomad
==> Monitoring evaluation "432f4471"
    Evaluation triggered by job "example"
    Allocation "47dff45a" created: node "b62b7daf", group "cache"
    Allocation "baf43930" created: node "b62b7daf", group "cache"
    Allocation "dfee8772" modified: node "b62b7daf", group "cache"
    Evaluation within deployment: "ae570b04"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "432f4471" finished with status "complete"

job詳細確認。alloc 2コ増え3コになっている。

$ nomad status example
ID            = example
Name          = example
Submit Date   = 2019-04-09T11:31:50Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         3        0       0         0

Latest Deployment
ID          = ae570b04
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       3        3       3        0          2019-04-09T11:42:10Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
47dff45a  b62b7daf  cache       1        run      running  1m36s ago   1m16s ago
baf43930  b62b7daf  cache       1        run      running  1m36s ago   1m21s ago
dfee8772  b62b7daf  cache       1        run      running  23m28s ago  1m25s ago

■ イメージ変更 redis:3.2 -> redis:4.0

task "redis のイメージを redis:3.2 -> redis:4.0 にする。

$ vi example.nomad
↓
image = "redis:3.2" <--- redis:4.0

プラン確認。プラン確認というかプラン生成か。

$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (1 create/destroy update, 2 ignore)
  +/- Task: "redis" (forces create/destroy update)
    +/- Config {
      +/- image:           "redis:3.2" => "redis:4.0"
          port_map[0][db]: "6379"
        }

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 58
To submit the job with version verification run:

nomad job run -check-index 58 example.nomad

When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

この時点ではまだallocに影響はない。

$ nomad status example
↓
Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
47dff45a  b62b7daf  cache       1        run      running  8m33s ago   8m13s ago
baf43930  b62b7daf  cache       1        run      running  8m33s ago   8m18s ago
dfee8772  b62b7daf  cache       1        run      running  30m25s ago  8m22s ago

プラン適用。1プランに複数 Evaluation がぶら下がっていて、Evaluation ごとに check-index が進んでいく関係だろうか。 新alloc:33eac15d が生成された。ここで3コでなく1コなのは、task のイメージ変更は 1コ成功したら次のをやる、って感じだからか。

$ nomad job run -check-index 58 example.nomad
==> Monitoring evaluation "e226b835"
    Evaluation triggered by job "example"
    Allocation "33eac15d" created: node "b62b7daf", group "cache"
    Evaluation within deployment: "03079e18"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "e226b835" finished with status "complete"

job:example の状態をポーリング。version:2 に切り替わっている様子を確認できた。

$ nomad status example
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
33eac15d  b62b7daf  cache       2        run      running   10s ago     5s ago
47dff45a  b62b7daf  cache       1        run      running   10m27s ago  10m7s ago
baf43930  b62b7daf  cache       1        run      running   10m27s ago  10m12s ago
dfee8772  b62b7daf  cache       1        stop     complete  32m19s ago  5s ago
↓
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
bca76393  b62b7daf  cache       2        run      pending   3s ago      3s ago
33eac15d  b62b7daf  cache       2        run      running   28s ago     4s ago
47dff45a  b62b7daf  cache       1        run      running   10m45s ago  10m25s ago
baf43930  b62b7daf  cache       1        stop     running   10m45s ago  3s ago
dfee8772  b62b7daf  cache       1        stop     complete  32m37s ago  23s ago
↓
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
f624164c  b62b7daf  cache       2        run      pending   0s ago      0s ago
bca76393  b62b7daf  cache       2        run      running   25s ago     1s ago
33eac15d  b62b7daf  cache       2        run      running   50s ago     26s ago
47dff45a  b62b7daf  cache       1        stop     running   11m7s ago   0s ago
baf43930  b62b7daf  cache       1        stop     complete  11m7s ago   20s ago
dfee8772  b62b7daf  cache       1        stop     complete  32m59s ago  45s ago

job 停止

停止

$ nomad job stop example
==> Monitoring evaluation "2c30cd46"
    Evaluation triggered by job "example"
    Evaluation within deployment: "03079e18"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "2c30cd46" finished with status "complete"

確認

$ nomad job status
ID       Type     Priority  Status          Submit Date
example  service  50        dead (stopped)  2019-04-09T11:42:07Z

job 再開

再開

vagrant@nomad:~$ nomad job run example.nomad
==> Monitoring evaluation "332362e1"
    Evaluation triggered by job "example"
    Allocation "1f19159f" created: node "b62b7daf", group "cache"
    Allocation "381f28af" created: node "b62b7daf", group "cache"
    Allocation "513949b9" created: node "b62b7daf", group "cache"
    Evaluation within deployment: "705c27d6"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "332362e1" finished with status "complete"

確認

ID       Type     Priority  Status   Submit Date
example  service  50        running  2019-04-09T11:49:04Z

詳細確認。version:3 がないのは、停止も数えているため?

$ nomad status example
↓
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
1f19159f  b62b7daf  cache       4        run      running   13s ago     1s ago
381f28af  b62b7daf  cache       4        run      running   13s ago     11s ago
513949b9  b62b7daf  cache       4        run      running   13s ago     11s ago
f624164c  b62b7daf  cache       2        stop     complete  6m20s ago   1m40s ago
bca76393  b62b7daf  cache       2        stop     complete  6m45s ago   1m40s ago
33eac15d  b62b7daf  cache       2        stop     complete  7m10s ago   1m39s ago
47dff45a  b62b7daf  cache       1        stop     complete  17m27s ago  1m44s ago
baf43930  b62b7daf  cache       1        stop     complete  17m27s ago  1m44s ago
dfee8772  b62b7daf  cache       1        stop     complete  39m19s ago  1m44s ago

nomad チュートリアルはもう少し。kubernetes で調べたいことが終わってない。kafka もやりたい。

nomad トライアル jobs

https://www.nomadproject.io/intro/getting-started/jobs.html

Job と TaskGroup

Jobは、Nomadの作業負荷を宣言するユーザーによって提供される仕様です。 仕事は望ましい状態の一形態です。 ユーザーは、ジョブが実行されるべきであることを表現していますが、実行されるべき場所ではありません。 Nomadの責任は、実際の状態がユーザーの希望する状態と確実に一致するようにすることです。ジョブは1つ以上のタスクグループで構成されています。

タスクグループ

タスクグループは、一緒に実行する必要がある一連のタスクです。 たとえば、Webサーバーでは、ログ配布コプロセスも常に実行されている必要があります。 タスクグループはスケジューリングの単位です。 つまり、グループ全体を同じクライアントノードで実行する必要があり、分割することはできません。

タスク

タスクは、Nomadでの作業の最小単位です。 タスクはドライバによって実行されるため、Nomadはサポートするタスクの種類に柔軟に対応できます。 タスクは、ドライバ、ドライバの設定、制約、および必要なリソースを指定します。

ライバー

ドライバーは、タスクを実行するための基本的な手段を表します。ドライバの例には、Docker、QemuJava、および静的バイナリが含まれます。

- job              <--- 
  - タスクグループ <--- スケジューリング単位 1ノードで実行
    - タスク       <--- the smallest unit of work

Running a Job

job initコマンドでスケルトンジョブファイルを生成。example.nomad ファイルができた。

$ nomad job init
Example job file written to example.nomad

example.nomad を確認。大量コメントをカットしたらこうなった。 jsonとも異なるこの記法はなんだろう。 job:example -> group:cache -> task:redis といった構造。Dockerドライバで docker run redis:3.2 するようだ。

job "example" {
  datacenters = ["dc1"]
  type = "service"
  update {
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    progress_deadline = "10m"
    auto_revert = false
    canary = 0
  }
  migrate {
    max_parallel = 1
    health_check = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  }
  group "cache" {
    count = 1
    restart {
      attempts = 2
      interval = "30m"
      delay = "15s"
      mode = "fail"
    }
    ephemeral_disk {
      size = 300
    }
    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }
      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
        network {
          mbits = 10
          port "db" {}
        }
      }
      service {
        name = "redis-cache"
        tags = ["global", "cache"]
        port = "db"
        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

nomad job run で job:example を登録?実行?

vagrant@nomad:~$ nomad job run example.nomad
==> Monitoring evaluation "fc8f4a76"
    Evaluation triggered by job "example"
    Allocation "4666e64b" created: node "e84125ed", group "cache"
    Evaluation within deployment: "45ceab07"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "fc8f4a76" finished with status "complete"

nomad status で job:example を確認。時刻はUTC

$ nomad status example
ID            = example
Name          = example
Submit Date   = 2019-04-05T11:10:18Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         1        0       0         0

Latest Deployment
ID          = 45ceab07
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       1        1       1        0          2019-04-05T11:20:47Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
4666e64b  e84125ed  cache       0        run      running  3m55s ago  3m25s ago

これでローカルノードで実行されたそうだが? node status を確認する。id:e84125ed と Allocations.Node id:e84125ed が一致。

$ nomad node status
ID        DC   Name   Class   Drain  Eligibility  Status
e84125ed  dc1  nomad  <none>  false  eligible     ready

alloc status でノードに配置されたタスクグループを確認?

$ nomad alloc status 4666e64b
ID                  = 4666e64b
Eval ID             = fc8f4a76
Name                = example.cache[0]
Node ID             = e84125ed
Job ID              = example
Job Version         = 0
Client Status       = running
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 11m58s ago
Modified            = 11m28s ago
Deployment ID       = 45ceab07
Deployment Health   = healthy

Task "redis" is "running"
Task Resources
CPU        Memory            Disk     IOPS  Addresses
4/500 MHz  1008 KiB/256 MiB  300 MiB  0     db: 127.0.0.1:26775

Task Events:
Started At     = 2019-04-05T11:10:35Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-04-05T11:10:35Z  Started     Task started by client
2019-04-05T11:10:18Z  Driver      Downloading image redis:3.2
2019-04-05T11:10:18Z  Task Setup  Building Task Directory
2019-04-05T11:10:18Z  Received    Task received by client

alloc logs で alloc:4666e64b task:redis のログ確認

$ nomad alloc logs 4666e64b redis
1:C 05 Apr 11:10:35.349 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 3.2.12 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

1:M 05 Apr 11:10:35.350 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 05 Apr 11:10:35.350 # Server started, Redis version 3.2.12
1:M 05 Apr 11:10:35.350 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 05 Apr 11:10:35.350 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 05 Apr 11:10:35.350 * The server is now ready to accept connections on port 6379

Modifying a Job - count=1 -> 3

Job を修正するそうだ。vi example.nomad で count=1 を 3 に変更

  group "cache" {
    count = 1 <--- 3 に変更

nomad に渡してやる。「2 create, 1 in-place update」とある。2コ生成され、1コはそのまま?

$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 create, 1 in-place update)
  +/- Count: "1" => "3" (forces create)
      Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 24
To submit the job with version verification run:

nomad job run -check-index 24 example.nomad

When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

また「nomad job run -check-index 24 example.nomad」ともある。チュートリアルのようにやってみる。 alloc:94f5f11c fac747cb が created、4666e64bが modified。

$ nomad job run -check-index 24 example.nomad
==> Monitoring evaluation "481d337c"
    Evaluation triggered by job "example"
    Allocation "94f5f11c" created: node "e84125ed", group "cache"
    Allocation "fac747cb" created: node "e84125ed", group "cache"
    Allocation "4666e64b" modified: node "e84125ed", group "cache"
    Evaluation within deployment: "724043af"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "481d337c" finished with status "complete"

新しいalloc:94f5f11c を確認 Name:example.cache[1]

$ nomad alloc status 94f5f11c
ID                  = 94f5f11c
Eval ID             = 481d337c
Name                = example.cache[1]
Node ID             = e84125ed
Job ID              = example
Job Version         = 1
Client Status       = running
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 5m22s ago
Modified            = 5m3s ago
Deployment ID       = 724043af
Deployment Health   = healthy
:

もう一つの新しいalloc:fac747cb も確認。Name:example.cache[2]

$ nomad alloc status fac747cb
ID                  = fac747cb
Eval ID             = 481d337c
Name                = example.cache[2]
Node ID             = e84125ed
Job ID              = example
Job Version         = 1
Client Status       = running
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 6m40s ago
Modified            = 6m20s ago
Deployment ID       = 724043af
Deployment Health   = healthy
:

Modifying a Job - image=redis:3.2 -> redis:4.0

次はイメージを変更。

vi example.nomad

    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"  <--- redis:4.0 に変更

nomad に渡す。チュートリアル読んで違和感感じてたけどやっぱり「1 create/destroy update, 2 ignore」とある。3コupdate じゃないのか?

$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (1 create/destroy update, 2 ignore)
  +/- Task: "redis" (forces create/destroy update)
    +/- Config {
      +/- image:           "redis:3.2" => "redis:4.0"
          port_map[0][db]: "6379"
        }

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 62
To submit the job with version verification run:

nomad job run -check-index 62 example.nomad

When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

どうなっているのだろう。 -check-index Allocation "97dbfc56" createdとある

$ nomad job run -check-index 62 example.nomad
==> Monitoring evaluation "e52687f2"
    Evaluation triggered by job "example"
    Allocation "97dbfc56" created: node "e84125ed", group "cache"
    Evaluation within deployment: "3188c6a1"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "e52687f2" finished with status "complete"

allo:97dbfc56 のステータス見たりログ見たり・

$ nomad alloc status 97dbfc56
ID                  = 97dbfc56
Eval ID             = e52687f2
Name                = example.cache[0]
Node ID             = e84125ed
Job ID              = example
Job Version         = 2
Client Status       = running
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 2m33s ago
Modified            = 1m59s ago
Deployment ID       = 3188c6a1
Deployment Health   = healthy
:
$ nomad alloc logs 97dbfc56 redis
1:C 05 Apr 11:53:20.759 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 05 Apr 11:53:20.759 # Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 05 Apr 11:53:20.759 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 05 Apr 11:53:20.763 * Running mode=standalone, port=6379.
1:M 05 Apr 11:53:20.763 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 05 Apr 11:53:20.763 # Server initialized
1:M 05 Apr 11:53:20.763 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 05 Apr 11:53:20.763 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 05 Apr 11:53:20.763 * Ready to accept connections

さいしょのalloc:4666e64b 確認。「Client Status:complete」。終わったと。

$ nomad alloc status 4666e64b
ID                   = 4666e64b
Eval ID              = 481d337c
Name                 = example.cache[0]
Node ID              = e84125ed
Job ID               = example
Job Version          = 1
Client Status        = complete
:
Recent Events:
Time                  Type        Description
2019-04-05T11:52:59Z  Killed      Task successfully killed
2019-04-05T11:52:58Z  Killing     Sent interrupt. Waiting 5s before force killing
2019-04-05T11:10:35Z  Started     Task started by client
2019-04-05T11:10:18Z  Driver      Downloading image redis:3.2
2019-04-05T11:10:18Z  Task Setup  Building Task Directory
2019-04-05T11:10:18Z  Received    Task received by client
$ nomad alloc logs 4666e64b redis
:
1:M 05 Apr 11:10:35.350 * The server is now ready to accept connections on port 6379
1:signal-handler (1554465179) Received SIGTERM scheduling shutdown...
1:M 05 Apr 11:52:59.138 # User requested shutdown...
1:M 05 Apr 11:52:59.138 * Saving the final RDB snapshot before exiting.
1:M 05 Apr 11:52:59.142 * DB saved on disk
1:M 05 Apr 11:52:59.142 # Redis is now ready to exit, bye bye...

2コはどうなった。complete している。

$ nomad alloc status fac747cb
ID                   = fac747cb
:
Client Status        = complete

$ nomad alloc status 4666e64b
ID                   = 4666e64b
:
Client Status        = complete

やったことを再確認

■ Stopping a Job

まだ腑に落ちないが、、、job 停止

$ nomad job stop example
==> Monitoring evaluation "8fd141b1"
    Evaluation triggered by job "example"
    Evaluation within deployment: "3188c6a1"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "8fd141b1" finished with status "complete"

job:example確認。おや知らないaloc72113ea3, 062ce877 がある。

$ nomad status example
ID            = example
Name          = example
Submit Date   = 2019-04-05T11:52:58Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = dead (stopped)
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         0        0       6         0

Latest Deployment
ID          = 3188c6a1
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       3        3       3        0          2019-04-05T12:04:13Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
72113ea3  e84125ed  cache       2        stop     complete  26m33s ago  34s ago
062ce877  e84125ed  cache       2        stop     complete  26m53s ago  34s ago
97dbfc56  e84125ed  cache       2        stop     complete  27m28s ago  34s ago
94f5f11c  e84125ed  cache       1        stop     complete  41m56s ago  39s ago
fac747cb  e84125ed  cache       1        stop     complete  41m56s ago  39s ago
4666e64b  e84125ed  cache       1        stop     complete  1h10m ago   39s ago

ジョブの再開は nomad job run

$ nomad job run example.nomad
==> Monitoring evaluation "0459a7f5"
    Evaluation triggered by job "example"
    Allocation "805f4360" created: node "e84125ed", group "cache"
    Allocation "b49722f6" created: node "e84125ed", group "cache"
    Allocation "f4f093db" created: node "e84125ed", group "cache"
    Evaluation within deployment: "0debd07e"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "0459a7f5" finished with status "complete"

まだ腑に落ちないところがあるが、構造が分かってきた。job->group(allloc)->task(redis)

nomad トライアル running

https://www.nomadproject.io/intro/getting-started/running.html

エージェントとは

クライアント - Nomadのクライアントは、タスクを実行できるマシンです。 すべてのクライアントがNomadエージェントを実行しています。 エージェントは、サーバーへの登録、割り当てられる作業の監視、およびタスクの実行を担当します。 Nomadエージェントは、サーバーとやり取りする長期にわたるプロセスです。 https://www.nomadproject.io/docs/internals/architecture.html

エージェントの起動

久しぶりにvagrant使う。パスワード思い出すのに時間がかかった。vagrantチュートリアルのまま dev モード? でエージェント起動。

$ vagrant ssh
:
:

$ sudo nomad agent -dev
:
:

クラスタノード確認

$ vagrant ssh
:

$ nomad node status
ID        DC   Name   Class   Drain  Eligibility  Status
ed6136c5  dc1  nomad  <none>  false  eligible     ready
$ nomad server members
Name          Address    Port  Status  Leader  Protocol  Build  Datacenter  Region
nomad.global  127.0.0.1  4648  alive   true    2         0.8.6  dc1         global

エージェントを停止する

ctrl-c でエージェントを停止する。こんなログを得た。これはエージェントがクラスタから退席しているのを表現しているのだとか。

^C==> Caught signal: interrupt
    [DEBUG] http: Shutting down http server
    [INFO] agent: requesting shutdown
    [INFO] client: shutting down
    [INFO] nomad: shutting down server
    serf: Shutdown without a Leave
    [ERR] nomad: "Node.GetClientAllocs" RPC failed to server 127.0.0.1:4647: rpc error: EOF
    [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks
    [INFO] agent: shutdown complete

クラスタノード確認 2

リクエストするも失敗している。 ポート:4646 と 2つのエンドポイントを確認した。

$ nomad node status
Error querying node status: Get http://127.0.0.1:4646/v1/nodes: dial tcp 127.0.0.1:4646: connect: connection refused

$ nomad server members
Error querying servers: Get http://127.0.0.1:4646/v1/agent/members: dial tcp 127.0.0.1:4646: connect: connection refusedv

エージェント起動してcurl

jq いれた

sudo apt -y install jq

curl してみた

$ curl http://127.0.0.1:4646/v1/nodes | jq
[
  {
    "Address": "127.0.0.1",
    "CreateIndex": 6,
    "Datacenter": "dc1",
    "Drain": false,
    "Drivers": {
:
    },
    "ID": "cfebff0f-e21e-fabb-c777-17f9d9ae8439",
    "ModifyIndex": 9,
    "Name": "nomad",
    "NodeClass": "",
    "SchedulingEligibility": "eligible",
    "Status": "ready",
    "StatusDescription": "",
    "Version": "0.8.6"
  }
]
curl http://127.0.0.1:4646/v1/agent/members | jq
{
  "Members": [
    {
      "Addr": "127.0.0.1",
      "DelegateCur": 4,
      "DelegateMax": 5,
      "DelegateMin": 2,
      "Name": "nomad.global",
      "Port": 4648,
      "ProtocolCur": 2,
      "ProtocolMax": 5,
      "ProtocolMin": 1,
      "Status": "alive",
      "Tags": {
        "mvn": "1",
        "bootstrap": "1",
        "region": "global",
        "id": "eef87cd3-56e2-018d-b52a-6efc1952ce05",
        "rpc_addr": "127.0.0.1",
        "role": "nomad",
        "raft_vsn": "2",
        "port": "4647",
        "dc": "dc1",
        "vsn": "1",
        "build": "0.8.6"
      }
    }
  ],
  "ServerDC": "dc1",
  "ServerName": "nomad",
  "ServerRegion": "global"
}