Running a vps means dealing with a few recurring problems. Below I walk through the common issues you’ll encounter and give clear, practical fixes you can apply right away.
Quick checklist before you start
- Check your provider’s status page for outages.
- Take a snapshot or backup before major changes.
- Have console access (serial/console in the control panel) in case ssh fails.
Common vps problems and how to solve them
1. High CPU or memory usage
Symptoms: slow responses, processes stuck, or sudden spikes in load.
How to diagnose:
- Run
toporhtopto see which processes use CPU/RAM. - Use
ps aux --sort=-%mem | headto list top memory consumers.
How to fix:
- Restart or stop the runaway process:
systemctl restart SERVICEorkill -9 PIDif necessary. - Tune services (php-FPM, database) to use fewer workers or smaller buffers.
- Consider caching, adding swap (careful with SSD wear), or upgrading the plan if usage is sustained.
- Use resource limits: systemd
LimitCPU/LimitMEMor cgroups to contain hungry processes.
2. Disk full or low disk space
Symptoms: cannot write files, services fail, email bounce, package installs fail.
How to diagnose:
- Check usage:
df -handdu -sh /var/* | sort -h. - Check inodes (too many small files):
df -i.
How to fix:
- Clean package cache:
apt-get cleanoryum clean all. - Rotate and trim logs: configure
logrotateand runjournalctl --vacuum-size=200M. - Remove old kernels:
apt autoremoveon Debian/ubuntu. - Find large directories with
du -sh *and remove what’s not needed. - If inode exhaustion, remove many small temp files or rotate app caches; consider increasing disk or using different storage.
3. ssh won’t connect
Symptoms: timed out, connection refused, authentication failed.
How to diagnose:
- Try provider console access to bypass network issues.
- Check SSH status:
systemctl status sshdand logs/var/log/auth.logor/var/log/secure. - Verify firewall or cloud security group rules allow port 22 (or your custom port).
How to fix:
- Correct file permissions for keys:
chmod 600 ~/.ssh/authorized_keys. - Restart SSH:
systemctl restart sshd. - If misconfigured SSH, revert changes via console or boot into single-user mode.
- Enable password auth temporarily only if you must, then restore key-based auth immediately.
4. Service won’t start or keeps crashing
Symptoms: web server, database, or other daemons fail to run or crash repeatedly.
How to diagnose:
- Check
systemctl status SERVICEandjournalctl -u SERVICE -b. - Run service-specific checks:
nginx -torapachectl configtest.
How to fix:
- Fix configuration errors found by the test commands.
- Look for port conflicts (use
ss -tulpnornetstat -plnt). - Increase timeouts or adjust memory limits if the service fails under load.
- If logs are flooded, rotate them and free space so services can write logs again.
5. Network problems and high latency
Symptoms: slow page loads, packet loss, intermittent connectivity.
How to diagnose:
- Ping and traceroute to check paths and packet loss:
ping -c 10 example.com,traceroute. - Check interface and routes:
ip addr,ip route. - Check provider network status and firewall rules.
How to fix:
- Restart networking or reset the interface:
systemctl restart networkingor provider console. - Adjust MTU if fragmentation is causing problems.
- Contact your provider if their network or upstream is the issue.
- Use monitoring and synthetic checks to spot intermittent latency before customers do.
6. DNS and domain resolution issues
Symptoms: domain not resolving, pointing to old IP, propagation delays.
How to diagnose:
- Use
dig +short yourdomain.comandnslookup yourdomain.com. - Check TTLs at your dns provider and confirm A/AAAA/cname records are correct.
How to fix:
- Correct DNS records at the registrar or DNS host and wait for TTL to expire.
- Flush local dns cache if needed:
systemd-resolve --flush-cachesor restart local DNS resolver. - Use temporary hosts file entries for immediate testing via
/etc/hosts(remember to remove later).
7. Security incidents and unauthorized access
Symptoms: unknown processes, suspicious logins, changed files, crypto-mining activity.
How to diagnose:
- inspect logs:
/var/log/auth.log,/var/log/syslog, and web server logs. - Check running processes (
ps aux) and network connections (ss -tulpn).
How to fix and recover:
- Isolate the VPS: block external access at the provider firewall and restore from a known-good backup.
- Rotate all keys and passwords, disable unused accounts, and remove malicious cron jobs.
- Harden SSH: disable password login, use keys, change default port if desired, and restrict IPs where possible.
- Install and configure fail2ban or similar intrusion prevention tools.
8. Backup failures or missing restores
Symptoms: backups incomplete, failing uploads, restore produces errors.
How to diagnose:
- Check backup logs and verify backup integrity with test restores.
- Ensure offsite backups are being transferred successfully and storage quotas aren’t hit.
How to fix:
- Automate backups with timestamps and retention policy. Use snapshots for quick full-system recovery and rsync or object storage for file backups.
- Test restores regularly , a backup you can’t restore is useless.
9. Outdated software and missing security patches
Symptoms: known vulnerabilities shown by scanners, or software behaving oddly after age.
How to diagnose:
- Run package updates in a test environment first:
apt update && apt list --upgradable. - Use vulnerability scanners or managed patching tools.
How to fix:
- Keep the system updated on a schedule. Apply critical patches immediately and non-critical ones during maintenance windows.
- Use configuration management (Ansible, Puppet) to make upgrades repeatable.
10. Resource limits (file descriptors, processes)
Symptoms: “too many open files”, inability to spawn new processes, web server hitting limits under load.
How to diagnose:
- Check ulimits:
ulimit -a. Check systemd limits for services. - Review
journalctland service logs for “too many open files” messages.
How to fix:
- Increase limits in
/etc/security/limits.confor add a systemd drop-in file for the service (e.g.,LimitNOFILE=65536). - Tune application connection pools and close idle connections.
Troubleshooting workflow (simple and repeatable)
- 1) Reproduce the issue or collect logs. 2) Check provider status and console. 3) Narrow down to service, network, or resource problem. 4) Apply a safe fix (restart, free space, revert config). 5) Test, then document the fix and add monitoring or automation to catch it earlier.
Monitoring and prevention
- Set up basic alerts for disk, CPU, memory, and service health (Prometheus, Datadog, UptimeRobot).
- Automate backups and test restores.
- Use configuration management to make changes reproducible and reversible.
- Harden routinely: patch, rotate credentials, limit access, and run intrusion detection.
Summary
Most VPS problems fall into a few categories: resource exhaustion, disk and inode limits, network and DNS issues, SSH or service misconfigurations, backups, and security incidents. Start by checking provider status and console access, gather logs, then apply focused fixes like freeing disk space, restarting or tuning services, fixing configs, and patching systems. Add monitoring, automated backups, and basic hardening so the same problem is less likely to happen again.
