nomad トライアル jobs
https://www.nomadproject.io/intro/getting-started/jobs.html
Job と TaskGroup
Jobは、Nomadの作業負荷を宣言するユーザーによって提供される仕様です。 仕事は望ましい状態の一形態です。 ユーザーは、ジョブが実行されるべきであることを表現していますが、実行されるべき場所ではありません。 Nomadの責任は、実際の状態がユーザーの希望する状態と確実に一致するようにすることです。ジョブは1つ以上のタスクグループで構成されています。
タスクグループ
タスクグループは、一緒に実行する必要がある一連のタスクです。 たとえば、Webサーバーでは、ログ配布コプロセスも常に実行されている必要があります。 タスクグループはスケジューリングの単位です。 つまり、グループ全体を同じクライアントノードで実行する必要があり、分割することはできません。
タスク
タスクは、Nomadでの作業の最小単位です。 タスクはドライバによって実行されるため、Nomadはサポートするタスクの種類に柔軟に対応できます。 タスクは、ドライバ、ドライバの設定、制約、および必要なリソースを指定します。
ライバー
ドライバーは、タスクを実行するための基本的な手段を表します。ドライバの例には、Docker、Qemu、Java、および静的バイナリが含まれます。
- job <--- - タスクグループ <--- スケジューリング単位 1ノードで実行 - タスク <--- the smallest unit of work
Running a Job
job initコマンドでスケルトンジョブファイルを生成。example.nomad ファイルができた。
$ nomad job init Example job file written to example.nomad
example.nomad を確認。大量コメントをカットしたらこうなった。 jsonとも異なるこの記法はなんだろう。 job:example -> group:cache -> task:redis といった構造。Dockerドライバで docker run redis:3.2 するようだ。
job "example" { datacenters = ["dc1"] type = "service" update { max_parallel = 1 min_healthy_time = "10s" healthy_deadline = "3m" progress_deadline = "10m" auto_revert = false canary = 0 } migrate { max_parallel = 1 health_check = "checks" min_healthy_time = "10s" healthy_deadline = "5m" } group "cache" { count = 1 restart { attempts = 2 interval = "30m" delay = "15s" mode = "fail" } ephemeral_disk { size = 300 } task "redis" { driver = "docker" config { image = "redis:3.2" port_map { db = 6379 } } resources { cpu = 500 # 500 MHz memory = 256 # 256MB network { mbits = 10 port "db" {} } } service { name = "redis-cache" tags = ["global", "cache"] port = "db" check { name = "alive" type = "tcp" interval = "10s" timeout = "2s" } } } } }
nomad job run で job:example を登録?実行?
vagrant@nomad:~$ nomad job run example.nomad ==> Monitoring evaluation "fc8f4a76" Evaluation triggered by job "example" Allocation "4666e64b" created: node "e84125ed", group "cache" Evaluation within deployment: "45ceab07" Evaluation status changed: "pending" -> "complete" ==> Evaluation "fc8f4a76" finished with status "complete"
nomad status で job:example を確認。時刻はUTC。
$ nomad status example ID = example Name = example Submit Date = 2019-04-05T11:10:18Z Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 1 0 0 0 Latest Deployment ID = 45ceab07 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 1 1 1 0 2019-04-05T11:20:47Z Allocations ID Node ID Task Group Version Desired Status Created Modified 4666e64b e84125ed cache 0 run running 3m55s ago 3m25s ago
これでローカルノードで実行されたそうだが? node status を確認する。id:e84125ed と Allocations.Node id:e84125ed が一致。
$ nomad node status ID DC Name Class Drain Eligibility Status e84125ed dc1 nomad <none> false eligible ready
alloc status でノードに配置されたタスクグループを確認?
$ nomad alloc status 4666e64b ID = 4666e64b Eval ID = fc8f4a76 Name = example.cache[0] Node ID = e84125ed Job ID = example Job Version = 0 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 11m58s ago Modified = 11m28s ago Deployment ID = 45ceab07 Deployment Health = healthy Task "redis" is "running" Task Resources CPU Memory Disk IOPS Addresses 4/500 MHz 1008 KiB/256 MiB 300 MiB 0 db: 127.0.0.1:26775 Task Events: Started At = 2019-04-05T11:10:35Z Finished At = N/A Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-04-05T11:10:35Z Started Task started by client 2019-04-05T11:10:18Z Driver Downloading image redis:3.2 2019-04-05T11:10:18Z Task Setup Building Task Directory 2019-04-05T11:10:18Z Received Task received by client
alloc logs で alloc:4666e64b task:redis のログ確認
$ nomad alloc logs 4666e64b redis 1:C 05 Apr 11:10:35.349 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 3.2.12 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 6379 | `-._ `._ / _.-' | PID: 1 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 1:M 05 Apr 11:10:35.350 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 05 Apr 11:10:35.350 # Server started, Redis version 3.2.12 1:M 05 Apr 11:10:35.350 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1:M 05 Apr 11:10:35.350 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 1:M 05 Apr 11:10:35.350 * The server is now ready to accept connections on port 6379
Modifying a Job - count=1 -> 3
Job を修正するそうだ。vi example.nomad で count=1 を 3 に変更
group "cache" { count = 1 <--- 3 に変更
nomad に渡してやる。「2 create, 1 in-place update」とある。2コ生成され、1コはそのまま?
$ nomad job plan example.nomad +/- Job: "example" +/- Task Group: "cache" (2 create, 1 in-place update) +/- Count: "1" => "3" (forces create) Task: "redis" Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 24 To submit the job with version verification run: nomad job run -check-index 24 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
また「nomad job run -check-index 24 example.nomad」ともある。チュートリアルのようにやってみる。 alloc:94f5f11c fac747cb が created、4666e64bが modified。
$ nomad job run -check-index 24 example.nomad ==> Monitoring evaluation "481d337c" Evaluation triggered by job "example" Allocation "94f5f11c" created: node "e84125ed", group "cache" Allocation "fac747cb" created: node "e84125ed", group "cache" Allocation "4666e64b" modified: node "e84125ed", group "cache" Evaluation within deployment: "724043af" Evaluation status changed: "pending" -> "complete" ==> Evaluation "481d337c" finished with status "complete"
新しいalloc:94f5f11c を確認 Name:example.cache[1]
$ nomad alloc status 94f5f11c ID = 94f5f11c Eval ID = 481d337c Name = example.cache[1] Node ID = e84125ed Job ID = example Job Version = 1 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 5m22s ago Modified = 5m3s ago Deployment ID = 724043af Deployment Health = healthy :
もう一つの新しいalloc:fac747cb も確認。Name:example.cache[2]
$ nomad alloc status fac747cb ID = fac747cb Eval ID = 481d337c Name = example.cache[2] Node ID = e84125ed Job ID = example Job Version = 1 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 6m40s ago Modified = 6m20s ago Deployment ID = 724043af Deployment Health = healthy :
Modifying a Job - image=redis:3.2 -> redis:4.0
次はイメージを変更。
vi example.nomad
task "redis" { driver = "docker" config { image = "redis:3.2" <--- redis:4.0 に変更
nomad に渡す。チュートリアル読んで違和感感じてたけどやっぱり「1 create/destroy update, 2 ignore」とある。3コupdate じゃないのか?
$ nomad job plan example.nomad +/- Job: "example" +/- Task Group: "cache" (1 create/destroy update, 2 ignore) +/- Task: "redis" (forces create/destroy update) +/- Config { +/- image: "redis:3.2" => "redis:4.0" port_map[0][db]: "6379" } Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 62 To submit the job with version verification run: nomad job run -check-index 62 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid.
どうなっているのだろう。 -check-index Allocation "97dbfc56" createdとある
$ nomad job run -check-index 62 example.nomad ==> Monitoring evaluation "e52687f2" Evaluation triggered by job "example" Allocation "97dbfc56" created: node "e84125ed", group "cache" Evaluation within deployment: "3188c6a1" Evaluation status changed: "pending" -> "complete" ==> Evaluation "e52687f2" finished with status "complete"
allo:97dbfc56 のステータス見たりログ見たり・
$ nomad alloc status 97dbfc56 ID = 97dbfc56 Eval ID = e52687f2 Name = example.cache[0] Node ID = e84125ed Job ID = example Job Version = 2 Client Status = running Client Description = <none> Desired Status = run Desired Description = <none> Created = 2m33s ago Modified = 1m59s ago Deployment ID = 3188c6a1 Deployment Health = healthy :
$ nomad alloc logs 97dbfc56 redis 1:C 05 Apr 11:53:20.759 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 1:C 05 Apr 11:53:20.759 # Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=1, just started 1:C 05 Apr 11:53:20.759 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf 1:M 05 Apr 11:53:20.763 * Running mode=standalone, port=6379. 1:M 05 Apr 11:53:20.763 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 1:M 05 Apr 11:53:20.763 # Server initialized 1:M 05 Apr 11:53:20.763 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 1:M 05 Apr 11:53:20.763 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 1:M 05 Apr 11:53:20.763 * Ready to accept connections
さいしょのalloc:4666e64b 確認。「Client Status:complete」。終わったと。
$ nomad alloc status 4666e64b ID = 4666e64b Eval ID = 481d337c Name = example.cache[0] Node ID = e84125ed Job ID = example Job Version = 1 Client Status = complete : Recent Events: Time Type Description 2019-04-05T11:52:59Z Killed Task successfully killed 2019-04-05T11:52:58Z Killing Sent interrupt. Waiting 5s before force killing 2019-04-05T11:10:35Z Started Task started by client 2019-04-05T11:10:18Z Driver Downloading image redis:3.2 2019-04-05T11:10:18Z Task Setup Building Task Directory 2019-04-05T11:10:18Z Received Task received by client $ nomad alloc logs 4666e64b redis : 1:M 05 Apr 11:10:35.350 * The server is now ready to accept connections on port 6379 1:signal-handler (1554465179) Received SIGTERM scheduling shutdown... 1:M 05 Apr 11:52:59.138 # User requested shutdown... 1:M 05 Apr 11:52:59.138 * Saving the final RDB snapshot before exiting. 1:M 05 Apr 11:52:59.142 * DB saved on disk 1:M 05 Apr 11:52:59.142 # Redis is now ready to exit, bye bye...
2コはどうなった。complete している。
$ nomad alloc status fac747cb ID = fac747cb : Client Status = complete $ nomad alloc status 4666e64b ID = 4666e64b : Client Status = complete
やったことを再確認
- nomad job run example.nomad
- count1 -> count3
- nomad job plan example.nomad
- nomad job run -check-index 24 example.nomad
- redis3 -> redis4
- nomad job run -check-index 62 example.nomad
■ Stopping a Job
まだ腑に落ちないが、、、job 停止
$ nomad job stop example ==> Monitoring evaluation "8fd141b1" Evaluation triggered by job "example" Evaluation within deployment: "3188c6a1" Evaluation status changed: "pending" -> "complete" ==> Evaluation "8fd141b1" finished with status "complete"
job:example確認。おや知らないaloc72113ea3, 062ce877 がある。
$ nomad status example ID = example Name = example Submit Date = 2019-04-05T11:52:58Z Type = service Priority = 50 Datacenters = dc1 Status = dead (stopped) Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 0 0 6 0 Latest Deployment ID = 3188c6a1 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 3 3 3 0 2019-04-05T12:04:13Z Allocations ID Node ID Task Group Version Desired Status Created Modified 72113ea3 e84125ed cache 2 stop complete 26m33s ago 34s ago 062ce877 e84125ed cache 2 stop complete 26m53s ago 34s ago 97dbfc56 e84125ed cache 2 stop complete 27m28s ago 34s ago 94f5f11c e84125ed cache 1 stop complete 41m56s ago 39s ago fac747cb e84125ed cache 1 stop complete 41m56s ago 39s ago 4666e64b e84125ed cache 1 stop complete 1h10m ago 39s ago
ジョブの再開は nomad job run
$ nomad job run example.nomad ==> Monitoring evaluation "0459a7f5" Evaluation triggered by job "example" Allocation "805f4360" created: node "e84125ed", group "cache" Allocation "b49722f6" created: node "e84125ed", group "cache" Allocation "f4f093db" created: node "e84125ed", group "cache" Evaluation within deployment: "0debd07e" Evaluation status changed: "pending" -> "complete" ==> Evaluation "0459a7f5" finished with status "complete"
まだ腑に落ちないところがあるが、構造が分かってきた。job->group(allloc)->task(redis)