VPS Monitoring and Alerting Utility - Troubleshooting Production VPS

20 Apr 2024

Dev Ops

Objective

Facilitate root cause analysis and improve response to a slowing production VPS.

Technical Goals/Overview

Develop a VPS monitoring and alerting utility to monitor VPS health and send alerts when certain thresholds are exceeded.

Summary: Technical Implementation

Monitor a VPS and send an email alert if the following is above a specified thresholds.
- Load average
- CPU usage
- Memory usage
- Disk usage
- Number of processes
Alert emails are throttled to one per hour (or specified).
- If an alert condition is still present, another “VPS - Alert” email will be sent after an hour.
- If an alert condition is no longer present, an email will be sent to indicate “VPS - OK”
“VPS Log” with Summary email is sent once a day (or specified).
- mean, median, min and max are calculated/included for each parameter
- monitor log file is removed after sending

Overall, the technical implementation was centered around effectively monitoring and alerting on the health and performance metrics of the VPS. This aimed to enhance response time and identify the root cause of the production VPS slowdown. After analyzing additional VPS application logs, it was determined that external factors (such as DOS attack) were eliminated and a specific application was identified as the cause. To track and alert on the performance of this application, additional custom alerting and monitoring were implemented. The slowdown issue was resolved after identifying and resolving the application specific issue.

Skills Footprint:

Category	Technical Specifics
general	Troubleshooting * Monitoring * Observability * CI/CD * Automated Deployment
standards	Source Control
tools	VSCode * GIT * BASH * SSH
concepts	DevOps * Containerization * Troubleshooting
packages	Docker CLI

VPS Monitoring and Alerting Utility - Troubleshooting Production VPS

Objective

Technical Goals/Overview

Summary: Technical Implementation

Skills Footprint:

Related posts