每个计算节点都无法启动,报错信息为:

1
2
3
4
5
6
7
8
9
10
11
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.895622    4307 feature_gate.go:226] feature gates: &{{} map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]}
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.902964 4307 mount_linux.go:211] Detected OS with systemd
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.908967 4307 server.go:383] Version: v1.10.0+b81c8f8
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.909036 4307 feature_gate.go:226] feature gates: &{{} map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.909150 4307 plugins.go:89] No cloud provider specified.
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.909162 4307 server.go:499] No cloud provider specified: "" from the config file: ""
Jan 05 00:05:10 node1.example.com origin-node[4307]: E0105 00:05:10.931121 4307 bootstrap.go:198] Part of the existing bootstrap client certificate is expired: 2020-01-04 07:20:00 +0000 UTC
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.931145 4307 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.932606 4307 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.959131 4307 csr.go:105] csr for this node already exists, reusing
Jan 05 00:05:10 node1.example.com origin-node[4307]: I0105 00:05:10.967338 4307 csr.go:113] csr for this node is still valid

一、更新证书后,/etc/origin/node/cerxx**/client-current.(server).

如果有csr的话,就需要将csr(CertificateSigningRequest)批准通过

1
oc get csr -o name | xargs oc adm certificate approve

需要去查的是:

  1. 为什么1月4日会自动去更新kubelet证书
    因为生产上kubelet证书的默认有效期为1年,到期会自动更新证书。计算节点上相关的配置项为kubeletArguments.rotate-certificates: ['true']
  2. 为什么csr为Pending,而没有被批准
    这是openshift 3.11的Master节点恰好刚过期,但是此时的bootstrap的token没有过期,Node节点会向Master申请证书csr。而在OpenShift中csr的审批需要手动通过。所以这块需要做好监控与告警,确保生产上的证书不要过期。

相关文章:
https://access.redhat.com/solutions/3716861
https://access.redhat.com/solutions/4565991

二、数据库问题
数据库使用的镜像是:centos/mysql-57-centos7
由于是操作数据库mysql改变root的密码,而common.sh中会校验数据库的状态,但是该镜像中的common.sh默认root密码是空的,需要更改该脚本的代码,(添加ROOT密码):

1
2
// 第54行
mysql_flags="-u root -p$MYSQL_ROOT_PASSWORD --socket=/tmp/mysql.sock