redhat

今天参加Redhat大客户交流会,有一个主题是Redhat的小伙伴介绍Openshift的排错技巧。这个还是很值得参考的,于是将内容发在了这里,以便更多的小伙伴能够从中受益。当然Redhat小伙伴列出的也只是一部分,同时有些排错细节PPT中也并没有具体写,这篇中我就不扩展了。以下为PPT的内容。

OpenShift排错技巧

  1. 环境基本信息收集
  2. 日志等级
  3. 应用程序
  4. OC客户端排错
  5. 镜像仓库
  6. 网络
  7. 路由
  8. Installer
  9. DNS
  10. Etcd

日志等级

openshift service log:
/etc/origin/master/master.env #同时作用于API和Controllers
DEBUG_LOGLEVEL=4

/etc/sysconfig/atomic-openshift-node
OPTIONS=–loglevel=4

错误等级

  • 0 - Errors and warning only
  • 2 - Normal information
  • 4 - Debugging-level information
  • 6 - API-level debugging information (request / response)
  • 8 - Body-level API debugging information

Docker Log level
/etc/sysconfig/docker –log-level=debug

Etcd Log level
# source /etc/etcd/etcd.conf
# curl –cert $ETCD_PEER_CERT_FILE –key $ETCD_PEER_KEY_FILE –cacert $ETCD_TRUSTED_CA_FILE $ETCD_ADVERTISE_CLIENT_URLS/config/local/log -XPUT -d ‘{“Level”:”DEBUG”}’

Get Log
# /usr/local/bin/master-logs etcd etcd > $(hostname)-etcd.log 2>&1

OpenShift Builder Pod Logs
BUILD_LOGLEVEL in BC/env ## BC的环境变量中设置

应用日志

三个不同阶段: build errors, deployment errors 和applications errors

Build Errors
# oc logs bc/

Deployment Errors
# oc get status -o wide -n
# oc get events -o wide -n

Application Errors
# oc logs pod/ -p
# oc debug pod/
deploy a version of this pod without liveness and rediness probes as well as setting the entrypoint to the shell.

OC客户端

oc客户端日志等级
# oc whoami –loglevel=8

Setting this value between 6 and 8 will provide extensive logging

API requests being send (loglevel 6)
headers (loglevel 7)
responses received (loglevel 8)

OpenShift Registry

健康检查
基本的健康检查、保证Registry正常运行并且正常响应其对应的service地址

# RegistryAddr=$(oc get svc docker-registry -n default -o jsonpath={.spec.clusterIP}:{.spec.ports[0].port})

# curl -vk https://$RegistryAddr/healthz

测试镜像仓库
docker login -u openshift -p $(oc whoami -t) :
docker pull/tag/push

如果使用存储
# oc rsh $(oc get pods -o name -l docker-registry -n default)

OpenShift Networking

Debugging External Access to an HTTP Service
Debugging Node to Node Networking
Debugging Local Networking

使用NetWorking Diagnostics Tool检查网络状况
https://docs.openshift.com/container-platform/3.11/admin_guide/sdn_troubleshooting.html

OpenShift Routing

分段检查 定位问题 curl pod / svc

$ oc logs dc/router -n default
$ oc get dc/router -o yaml default
$ oc get route -n
$ oc get endpoints –all-namespaces
$ oc exec -it $ROUTER_POD – ls -la
$ oc exec -it $ROUTER_POD – find /var/lib/haproxy -regex “.*(.map|config.*|.json)“ -print -exec cat {} ; > haproxy_configs_and_maps

router log

Router健康状态查看
http://admin:@:1936/haproxy_stats

enable access log to syslog server

OpenShift Installer

OpenShift Ansible Playbooks

# ansible-playbook -vvv | tee ansible.logs

如果在某一个task上失败,可以访问github上的源码查找对应task具体操作步骤:
Access the Git Hub Install Repo: openshift/openshift-ansible

OpenShift DNS

Dnsmasq 是一个小型的DNS缓存服务器。它可以根据缓存来响应DNS查询或将其转发到外部真实的DNS服务器上,它安装在每个节点上。

Skydns是一个建立在ETCD之上的DNS服务器,它嵌入在节点的进程中,主要负责相应内部service的解析。

NetworkManager会启动origin dispatcher /etc/NetworkManager/dispatcher.d/99-origin-dns.sh以配置/etc/resolv.conf和一些其他文件

NetworkManager
请确定NetworkManger服务正常运行
请查看/etc/NetworkManager/dispatch.d/99-origin-dns.sh为可执行
请确定/etc/resolv.conf文件包含主机的私网IP,并且有正确的search域。/etc/resolv.conf是由NetworkManager服务生成的

请检查dnsmasq服务是否ok
systemctl status dnsmasq -l

OpenShift Etcd

设置etcd变量
# source /etc/etcd/etcd.conf
# export ETCDCTL_API=3

Set endpoint variable to include all etcd endpoints
# ETCD_ALL_ENDPOINTS=$(etcdctl –cert=$ETCD_PEER_CERT_FILE –key $ETCD_PEER_KEY_FILE –cacert $ETCD_TRUSTED_CA_FILE –endpoints=$ETCD_LISTEN_CLIENT_URLS –write-out=fields member list | awk ‘/ClientURL/{printf”%s%s”, sep, $3; sep=”,”}’)

check health of etcd
# etcdctl –cert=$ETCD_PEER_CERT_FILE –key $ETCD_PEER_KEY_FILE –cacert $ETCD_TRUSTED_CA_FILE –endpoints=$ETCD_LISTEN_CLIENT_URLS –write-out=table endpoint status

# etcdctl –cert=$ETCD_PEER_CERT_FILE –key $ETCD_PEER_KEY_FILE –cacert $ETCD_TRUSTED_CA_FILE –endpoints=$ETCD_LISTEN_CLIENT_URLS –write-out=table endpoint health

最佳实践

推荐
Red Hat OpenShift Container Platform Life Cycle Policy
https://access.redhat.com/support/policy/updates/openshift

OpenShift Container Platform Tested Integrations supported configuration
https://access.redhat.com/articles/2176281

不推荐

  1. Master节点和Infra节点混用
  2. 外部负载均衡和openshift节点混用
  3. 单独升级某个组件版本
  4. service ip

排错指南推荐

Troubleshooting OpenShift Container Platform: Cluster Metrics
https://access.redhat.com/articles/2448341

Troubleshooting OpenShift Container Platform 3.x: Aggregating Container Logging
https://access.redhat.com/articles/3136551

Troubleshooting OpenShift Container Platform: Middleware Containers
https://access.redhat.com/articles/3135421