Openshift各组件Master-Node-Etcd-Router-Registry证书维护

Openshift集群正常运行过程中，各个组件:Master、Node、Etcd、Router、Registry之前相互通信交互，它们之间都是通过加密协议通信。那么问题来了，对于tls证书是有有效期的，突然有一天，证书过期了怎么办？集群是不是就无法正常运行了呢？
现在我们就来看下，怎么能够让加密证书持续有效。

安装时，将证书有效期设置为很长，100年够不够
快速查看当前集群所有证书的有效期
证书过期了，我们该如何进行更新证书

安装时指定证书的有效期

默认情况下，etcd证书、openshift证书的有效期为5年，kubelet证书、私有镜像仓库registry证书、Route证书的有效期为2年。在集群安装时可以通过设置ansible/hosts中的参数来指定证书的有效期

[OSEv3:vars]
openshift_hosted_registry_cert_expire_days=730
openshift_ca_cert_expire_days=1825
openshift_node_cert_expire_days=730
openshift_master_cert_expire_days=730
etcd_ca_default_days=1825

查看当前集群所有证书的有效期

确保ansible/hosts中的参数有如下信息

1 2	openshift_is_atomic=false ansible_distribution=centos

检查

$ ansible-playbook playbooks/openshift-checks/certificate_expiry/easy-mode.yaml
$ #执行完成后可在roles/openshift_certificate_expiry/defaults/main.yml
$ #中的openshift_certificate_expiry_html_report_path
$ #所在路径下（默认是/tmp/cert-expiry-report.html）查看所有证书的过期时间

它将会展示出所有Master oc证书、etcd证书、kube证书、router默认证书、私有镜像仓库registry证书的过期时间

证书过期时间详情展示部分图

更新证书

更新证书方法可以只针对Master oc证书、etcd证书、kube证书、router默认证书、私有镜像仓库registry证书中的一种进行更新，也可以全部进行更新。

确保ansible/hosts中的参数有如下信息

1 2	openshift_master_cluster_hostname=master.example.com openshift_master_cluster_public_hostname=master.example.com

重新生成证书进行更新

全部一次性更新

1	$ ansible-playbook playbooks/redeploy-certificates.yml

只更新master CA证书

1	$ ansible-playbook playbooks/openshift-master/redeploy-openshift-ca.yml

只更新etcd CA证书

1	$ ansible-playbook playbooks/openshift-etcd/redeploy-ca.yml

只更新master Certificates证书

1	$ ansible-playbook playbooks/openshift-master/redeploy-certificates.yml

只更新etcd Certificates证书

1	ansible-playbook playbooks/openshift-etcd/redeploy-certificates.yml

只更新node Certificates证书

1	ansible-playbook playbooks/openshift-node/redeploy-certificates.yml

只更新私有镜像仓库Rgistry Certificates证书

1	ansible-playbook playbooks/openshift-hosted/redeploy-registry-certificates.yml

只更新Router Certificates证书

1	ansible-playbook playbooks/openshift-hosted/redeploy-router-certificates.yml

只更新etcd Certificates证书

1	ansible-playbook playbooks/openshift-etcd/redeploy-certificates.yml

使用自定义Master CA证书

安装时使用自定义证书

将证书的写在inventory的配置参数中

$ cat /etc/ansible/hosts
...
[OSEv3.vars]
...
openshift_master_ca_certificate={'certfile': '</path/to/ca.crt>', 'keyfile':   '</path/to/ca.key>'}
...

执行正常部署

1	$ ansible-playbook playbooks/deploy_cluster.yml

已运行集群，更新自定义证书
1. 同上面的1步骤，将证书的写在inventory的配置参数中
2. 运行更新Master CA证书playbook
  1
  $ ansible-playbook playbooks/openshift-master/redeploy-openshift-ca.yml

更新完成后遇到的问题

allinone的集群下更新所有证书时，在重启docker那一步中，卡住了。
Router重启一直报错。解决办法：删除secret router-crt证书，让它自动更新。

EFK证书更新

通过重新部署EFK来更新证书。

删除旧证书
1
$ rm -r /etc/origin/logging
确保在inventory文件中设置好了EFK证书相关的配置

执行EFK重新部署脚本

1 2	$ cd openshift-ansible $ ansible-playbook playbooks/openshift-logging/config.yml

该命令执行会出现如下错误信息

RUNNING HANDLER [openshift_logging_elasticsearch : Checking current health for {{ _es_node }} cluster] ***
Friday 14 December 2018 07:53:44 +0000 (0:00:01.571) 0:05:01.710 *******
[WARNING]: Consider using the get_url or uri module rather than running curl.
If you need to use command because get_url or uri is insufficient you can add
warn=False to this command task or set command_warnings=False in ansible.cfg to
get rid of this message.

fatal: [ec2-34-207-171-49.compute-1.amazonaws.com]: FAILED! => {"changed": true, "cmd": ["curl", "-s", "-k", "--cert", "/tmp/openshift-logging-ansible-3v1NOI/admin-cert", "--key", "/tmp/openshift-logging-ansible-3v1NOI/admin-key", "https://logging-es.openshift-logging.svc:9200/_cluster/health?pretty"], "delta": "0:00:01.024054", "end": "2018-12-14 02:53:33.467642", "msg": "non-zero return code", "rc": 7, "start": "2018-12-14 02:53:32.443588", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
RUNNING HANDLER [openshift_logging_elasticsearch : Set Logging message to manually restart] ***
Friday 14 December 2018 07:53:46 +0000 (0:00:01.557) 0:05:03.268 *******

通过删除以下pod，来完成密钥的刷新
1
$ oc delete pod --all -n openshift-logging

补充

OpenShift 3.10版本起删除了单独更新Node证书的脚本。
OpenShift 3.11版本前更新证书会重启Docker；而在OpenShift 3.11后期版本优化了更新证书过程，不再重启Docker。
如果集群证书已经过期了怎么办？
一旦集群证书过期，OpenShift中的各组件的交互都会报错，包括WebConsole。此时一定要及时更新集群的证书。但是如果此时直接运行更新证书的脚本，它会检查证书是否过期，如果过期会报错，并终止更新。此时需要在ansible inventory文件中添加变量：openshift_certificate_expiry_fail_on_warn=flase，再运行更新证书脚本即可。如果ca证书也需要更新的话，需要额外添加变量：openshift_redeploy_openshift_ca=true，再运行更新证书脚本即可。