Monitoring system resources
Disk space
Troubleshooting workflow:
df -h
to see what volume is causing issuesdu -hsc *
to find the directory causing issuesncdu [-x]
for deep dive
df
disk free - displays file system usage:
- lists all mounted volumes and how much space is left on each in bytes
- change bytes with
-h
option - Device names are generated by the type of hardware that the underlying storage device is on
- Also need to consider inodes in addition to the size of the data
- inode is a database object that contains metadata for the actual items that you’re storing
- file owner, permissions, last modified date, etc
- If failure bc of inodes, server is creating too many files, such as log files or email messages
df -h # list disk space in human-readable format
df -i # list inode usage
ncdu
NCurses Disk Usage:
- Get disk space and look through the results
- Need to install first
- Can only scan dirs that the user can access
sudo apt install ncdu # install
ncdu -x # view only current fs
Disk usage by directory
du
Shows how much space a directory is using:
- scans the current directory and subdirectories that you have permissions to
- run as
root
to get full picture
- run as
- After you find the general location of the disk hog,
cd
into dirs and rundu
again
du -hsc * # human-readable, summary, total usage
Memory usage
free
Displays the current memory usage in KB:
- To see if there is a problem, look at the Mem available vs Mem total.
free
memory is the only memory that is actually not in use at allavailable
is actually in use by the system cache, but the kernel can free this memory for use if an app needs it.- This is because any RAM that is not in use is wasted - its about efficiency
- “Extra” RAM is given to the filesystem cache, which stores data that is written to disk when the time is right (it is synchronized)
- This makes your system faster, bc the system doesn’t have to read/write to disk for recently used files - it goes to RAM
tmpfs
is a temporary filesystem in Linux that resides in memory (RAM) rather than on a physical storage device. It is typically used for storing temporary files that don’t need to persist after a system reboot.
Column | Description |
---|---|
total | Total memory on the server |
used | Memory that is used. used = total - free - buffers/cache |
free | Memory not in use by anything |
shared | Memory used by tmpfs and other shared resources |
buff/cache | Memory used by buffers and cache |
available | Memory that is free for app use. Much of this is actually used for RAM. |
free # memory usage in KB
free -m # memory usage in MB (recommended)
# --- Example to understand columns --- #
free -m
total used free shared buff/cache available
Mem: 3915 533 2474 1 1198 3381
Swap: 2335 0 2335
Swap
A disk partition or a file that acts like RAM when your server memory is saturated:
- On disk, so much slower than RAM
- Prevents OOM from killing processes
- After 16.04, Ubuntu uses a swap file, not swap partition
- Easier to grow and shrink a file than partition
- No need to make a swap file or partition anymore
- swap is listed in
/etc/fstab
file - Only delete swap file if you need to make a larger one
- Some apps like K8s require that you disable swap
- Recommend 2GB swap files on servers at least
swappiness
is the point at which (how frequently) your server uses swap- Set to
60
by default - higher the value, more likely the server uses swap
- Change in
/etc/sysctl.conf
to persist swappiness after reboot
- Set to
grep swap /etc/fstab # swap file in /etc/fstab
/swap.img none swap sw 0 0
swapon -a # finds swap with /etc/fstab, mounts it, activates it
swapoff -a # deactivates swap
cat /proc/sys/vm/swappiness # view swapiness
sysctl vm.swappiness=30 # change swappiness until reboot
# --- Creating a swap file --- #
# 1. Create the file with fallocate
fallocate -l 2G /swapfile
# 2. Set permissions
chmod 0600 /swapfile
# 3. Convert to swap file
mkswap /swapfile
# 4. Mount it with /etc/fstab
/swapfile none swap 0 0
# 5. Activate new swapfile
swapon -a
# --- Change swappiness and persist after reboot --- #
# 1. Edit /etc/sysctl.conf
/etc/sysctl.conf
# 2. Add new val to bottom of file
vm.swappiness = 30
fallocate
Create a file with a preallocated size:
fallocate -l <size> <filename>
# l - length of file in bytes
fallocate -l 4G /swapfile
Load average
Represents your server’s trend in CPU utilization over time:
- Stored in
/proc/loadavg
- easier to view with
uptime
- numbers are 1 min, 5 min, 15 min
- Represent how many tasks were waiting for CPU in that time period
- Less than 1 is good
- If load avg = # CPUs on system, then they are all running 100%
- If load avg > # CPUs on system, you have an issue
- Analogy: cashiers at a supermarket - if there are 4 cashiers and 4 customers checking out, they are running at capacity. If there are 6 customers, then the store is above capacity
- Develop baselines for your server so you know what is normal. For example, if it goes from 1.x to 0.x, then you are overspending on your server or maybe a service is down
- Better view of CPU usage than something like
htop
bc CPU usage can go to 100% when a process is running but then back down when complete, so theuptime
view over time gives a better picture - A server can have multiple CPUs (physical cores), and each CPU can have multiple cores (logical cores). The kernel treats physical and logical cores the same
cat /proc/loadavg # view load avg in /proc
0.00 0.00 0.00 1/234 14112
uptime # view load avg, resets at reboot
nproc # get number of cores
View resource usage
htop
Provides an overall view of your server performance. Better than top
:
- Maybe run as
root
for additional capabilities, like killing processes - Add CPU average for all cores:
F2
> Meters, thenF5
to select CPU average - Can navigate with mouse. Ex: click
Quit
in bottom of page to exit - View by user by entering
u
- Tree view with
F5
- Refreshes every 2 seconds, but can change that with
-d
option and number in tenths of seconds
F2 # Setup mode to view options set colors, etc
u # View Show processes of: menu to view processes for specific user
F5 # Enter/exit tree view
htop -d 70 # Refresh every 7 seconds
F9 # Kill the selected process - will provide signal options