VPS Monitoring and Alerting Utility - Troubleshooting Production VPS

Dev Ops


Objective

Facilitate root cause analysis and improve response to a slowing production VPS.

Technical Goals/Overview

Develop a VPS monitoring and alerting utility to monitor VPS health and send alerts when certain thresholds are exceeded.

Summary: Technical Implementation

  1. Monitor a VPS and send an email alert if the following is above a specified thresholds.
    • Load average
    • CPU usage
    • Memory usage
    • Disk usage
    • Number of processes
  2. Alert emails are throttled to one per hour (or specified).
    • If an alert condition is still present, another “VPS - Alert” email will be sent after an hour.
    • If an alert condition is no longer present, an email will be sent to indicate “VPS - OK”
  3. “VPS Log” with Summary email is sent once a day (or specified).
    • mean, median, min and max are calculated/included for each parameter
    • monitor log file is removed after sending

Overall, the technical implementation was centered around effectively monitoring and alerting on the health and performance metrics of the VPS. This aimed to enhance response time and identify the root cause of the production VPS slowdown. After analyzing additional VPS application logs, it was determined that external factors (such as DOS attack) were eliminated and a specific application was identified as the cause. To track and alert on the performance of this application, additional custom alerting and monitoring were implemented. The slowdown issue was resolved after identifying and resolving the application specific issue.

Skills Footprint:

Category Technical Specifics
general Troubleshooting * Monitoring * Observability * CI/CD * Automated Deployment
standards Source Control
tools VSCode * GIT * BASH * SSH
concepts DevOps * Containerization * Troubleshooting
packages Docker CLI