在部分场景下,会遇到PVE虚拟机自动关机的情况,也没排查出具体原因来,索性从根本上解决,监控虚拟机状态,检查到虚拟机关机状态的时候,直接执行qm start启动。
监控脚本:
#!/usr/bin/env bash function check_and_restart() { vm_id="${1}" vm_ip="${2}" # curl --connect-timeout 5 -sSL "${vm_ip}" > /dev/null ping -c 1 "${vm_ip}" > /dev/null if [[ $? != 0 ]]; then now=`timedatectl status | grep 'Local time' | awk -F"Local time: " '{ print $2 }'` echo "[${now}] [NO] id = ${vm_id}, ip = ${vm_ip}" /usr/sbin/qm stop "${vm_id}" /usr/sbin/qm start "${vm_id}" else echo VM "$vm_id" is runing! fi } function main() { vm_list=${1} for each in ${vm_list}; do vm_id=`echo "${each}" | awk -F: '{ print $1 }'` vm_ip=`echo "${each}" | awk -F: '{ print $2 }'` check_and_restart "${vm_id}" "${vm_ip}" done } # 需要检查的虚拟机列表,格式为 vm_id:vm_ip vm_list=" 100:192.168.1.2 101:192.168.1.3 103:192.168.1.4 102:192.168.1.5 " # 打印时间 timedatectl status | grep 'Local time' | awk -F"Local time: " '{ print $2 }' main "${vm_list}"
存为/root/check文件后,使用crontab -e 添加到crontab中:
*/10 * * * * bash /root/check >> /root/log.txt
注意:如果部分虚拟机启动较慢,需要手动调整检测时间,否则可能会存在启动中的虚拟机无法检测到存活,然后再次执行强制启动,会导致死循环。
执行日志如下:
Tue 2024-02-06 09:50:01 CST VM 100 is runing! VM 101 is runing! VM 103 is runing! VM 102 is runing! Tue 2024-02-06 10:00:01 CST VM 100 is runing! VM 101 is runing! VM 103 is runing! VM 102 is runing! Tue 2024-02-06 10:10:01 CST VM 100 is runing! VM 101 is runing! VM 103 is runing! VM 102 is runing!