DevOps Linux Cheat Sheet
πΊοΈ The Mental Model
How to use this: Don't read it top to bottom. Use it as a reference. The debugging flow at the bottom is the most important thing here.
graph TD
A[ποΈ Filesystem\nWhere things live] --> B[π Permissions\nWho can touch them]
B --> C[βοΈ Processes\nWhat's running]
C --> D[π Networking\nHow things talk]
D --> E[π Environment\nWhat config they use]
E --> F[π Services\nProduction control]
style A fill:#1e3a5f,color:#fff
style B fill:#1e3a5f,color:#fff
style C fill:#1e3a5f,color:#fff
style D fill:#1e3a5f,color:#fff
style E fill:#1e3a5f,color:#fff
style F fill:#1e3a5f,color:#fff
π 1. Filesystem + Permissions
Permission Model
rwx r-x r--
β β βββ others (world)
β ββββββββ group
βββββββββββββ user (owner)
| Octal | Binary | Meaning |
|---|---|---|
7 |
111 |
rwx |
6 |
110 |
rw- |
5 |
101 |
r-x |
4 |
100 |
r-- |
0 |
000 |
--- |
Core Commands
ls -lah # list with permissions, human sizes, hidden files
chmod 755 file # rwxr-xr-x
chmod -R 750 /app # recursive
chown user:group file
chown -R appuser:app /appKey Rules
Directory vs File
- Files need
rto read,wto write- Directories need
xto enter (cd into)- Access requires permissions on every parent directory
# Typical production pattern
chown -R appuser:app /app
chmod -R 750 /app # owner=rwx, group=r-x, others=noneOther Permission Tools
groups # what groups your user is in
umask # view default permission mask
umask 022 # new files = 644, dirs = 755
chmod g+s /shared-dir # setgid: new files inherit groupNever do
chmod 777You've just removed all security. Every user, every process can read, write, and execute. This is always wrong in production.
βοΈ 2. Processes
Process States
graph TD
A[Created] --> B[Running]
B --> C[Terminated]
B --> D[Sleeping]
D -->|bad signal handling| E[Zombie]
style C fill:#2d5a27,color:#fff
style E fill:#5a2727,color:#fff
View Processes
ps aux # all processes, full detail
ps aux | grep appname # filter for specific app
top # live view, press q to quit
htop # better top (if installed)Kill Processes
kill PID # SIGTERM (15) β graceful, app can clean up
kill -9 PID # SIGKILL β force kill, no cleanup
killall appname # kill by nameSignal Reference
| Signal | Number | Meaning |
|---|---|---|
SIGTERM |
15 | Graceful stop β try this first |
SIGKILL |
9 | Force kill β no cleanup |
SIGINT |
2 | Ctrl+C |
SIGHUP |
1 | Reload config (nginx, etc.) |
Background Jobs
sleep 100 & # run in background
jobs # list background jobs
fg # bring last job to foreground
fg %2 # bring job #2 to foreground
bg # resume stopped job in background
nohup python app.py & # survive terminal close (use systemd instead in production)Use systemd in production
nohupis for quick hacks. If you're running a real service, usesystemctl(see Section 6).
π 3. Networking
Debugging Flow
graph LR
A[ping host] -->|reachable?| B[ss -tulnp]
B -->|port open?| C[curl localhost:port]
C -->|service responding?| D[dig / nslookup]
D -->|DNS resolving?| E[Done β
]
style A fill:#2d5a27,color:#fff
style B fill:#2d5a27,color:#fff
style C fill:#2d5a27,color:#fff
style D fill:#2d5a27,color:#fff
style E fill:#2d5a27,color:#fff
Connectivity
ping google.com # basic reachability
traceroute google.com # hop-by-hop pathPorts & Listening Services
ss -tuln # listening ports (no process names)
ss -tulnp # listening ports WITH process names (use this)Reading ss output:
Netid State Local Address:Port
tcp LISTEN 0.0.0.0:8080 β all interfaces, port 8080
tcp LISTEN 127.0.0.1:5432 β localhost only (postgres)
tcp LISTEN :::80 β all IPv6 interfaces
HTTP / API Testing
curl -I https://example.com # headers only
curl -v https://example.com # verbose (debug TLS etc.)
curl -X POST -d '{"key":"val"}' \
-H "Content-Type: application/json" \
http://localhost:8080/api
curl -o /dev/null -s -w "%{http_code}" url # just the status codeDNS Resolution
dig google.com # full DNS lookup
dig google.com +short # just the IP
nslookup google.com # alternative
cat /etc/hosts # local overrides (check here first!)
cat /etc/resolv.conf # which DNS servers are configuredDNS is usually the problem Before assuming your app is broken, check: does the hostname resolve?
dig yourdomain.com. Many "networking issues" are just broken DNS.
Port Scanning
nmap localhost # scan local ports
nmap -p 8080 hostname # check specific port on remote hostπ 4. I/O Redirection & File Descriptors
This is everywhere in log pipelines, scripts, and debugging. Not optional.
The Three Streams
stdin (0) β keyboard / pipe input
stdout (1) β terminal / file output
stderr (2) β error output (separate from stdout)
Redirection
command > file.txt # stdout β file (overwrite)
command >> file.txt # stdout β file (append)
command 2> error.txt # stderr β file
command 2>&1 # merge stderr into stdout
command > all.txt 2>&1 # stdout + stderr β file
command 2>/dev/null # suppress errors
command > /dev/null 2>&1 # suppress everythingPipes
ps aux | grep nginx # pipe stdout of ps into grep
cat file | sort | uniq # chain commands
command | tee file.txt # write to file AND show on screenπ 5. grep / awk / sed (Log Parsing Essentials)
You will use these daily. Learn them.
grep β Search
grep "error" app.log # find lines with "error"
grep -i "error" app.log # case insensitive
grep -n "error" app.log # show line numbers
grep -r "TODO" ./src # recursive search in directory
grep -v "debug" app.log # exclude lines matching
grep -A 3 "error" app.log # show 3 lines AFTER match
grep -B 3 "error" app.log # show 3 lines BEFORE match
grep -E "error|warn" app.log # regex ORawk β Column Extraction
awk '{print $1}' file # print first column
awk '{print $1, $3}' file # print columns 1 and 3
awk -F: '{print $1}' /etc/passwd # use : as delimiter
awk '/error/ {print $0}' file # print lines matching pattern
# Real example: get PIDs from ps
ps aux | awk '{print $2}'sed β Find & Replace
sed 's/old/new/' file # replace first occurrence per line
sed 's/old/new/g' file # replace all occurrences
sed -i 's/old/new/g' file # in-place edit (modifies file)
sed -n '10,20p' file # print lines 10β20
sed '/pattern/d' file # delete lines matching patternReal-world combo
# Count errors per minute from a log
grep "ERROR" app.log | awk '{print $1, $2}' | sort | uniq -cβ° 6. Cron (Scheduled Jobs)
Cron Syntax
* * * * * command
β β β β β
β β β β βββ Day of week (0β7, 0=Sunday)
β β β βββββ Month (1β12)
β β βββββββ Day of month (1β31)
β βββββββββ Hour (0β23)
βββββββββββ Minute (0β59)
Examples
0 * * * * # every hour at :00
*/15 * * * * # every 15 minutes
0 2 * * * # daily at 2:00 AM
0 2 * * 0 # weekly Sunday at 2 AM
0 0 1 * * # monthly on 1st at midnightManaging Crontabs
crontab -e # edit your crontab
crontab -l # list your crontabs
crontab -r # remove all (careful!)
sudo crontab -e # edit root's crontabGood Practices
# Always redirect output or it emails you (annoying)
0 2 * * * /path/to/script.sh >> /var/log/myjob.log 2>&1
# Use absolute paths β cron has minimal $PATH
0 2 * * * /usr/bin/python3 /home/user/script.pyCron doesn't have your environment Your
~/.bashrcdoesn't load in cron. Always use absolute paths and set env vars explicitly in the crontab or script.
π 7. SSH
Basic Usage
ssh user@hostname # connect
ssh -p 2222 user@hostname # non-default port
ssh -i ~/.ssh/mykey user@host # specific keyKey-Based Auth Setup
# Generate a key pair (do this once)
ssh-keygen -t ed25519 -C "your@email.com"
# Copy public key to server
ssh-copy-id user@hostname
# or manually:
cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys # on serverFile Transfer
scp file.txt user@host:/path/ # copy file to server
scp user@host:/path/file.txt . # copy file from server
scp -r dir/ user@host:/path/ # copy directory
rsync -avz ./local/ user@host:/path/ # sync (better than scp for dirs)Port Forwarding
# Local: access remote service locally
ssh -L 8080:localhost:5432 user@host
# Now: localhost:8080 β host:5432 (postgres on remote)
# Remote: expose local port on remote server
ssh -R 9090:localhost:8080 user@hostSSH Config (saves typing)
# ~/.ssh/config
Host myserver
HostName 192.168.1.100
User ubuntu
Port 2222
IdentityFile ~/.ssh/mykey
# Now just:
ssh myserverπΏ 8. Environment Variables
Basics
echo $HOME # print a variable
echo $PATH # where shell looks for commands
env # list all env vars
printenv VAR_NAME # print specific var
export MY_VAR="value" # set for current session + child processes
MY_VAR="value" command # set only for this one commandPersistence
# Add to ~/.bashrc (user shell sessions)
echo 'export MY_VAR="value"' >> ~/.bashrc
source ~/.bashrc # reload without restarting terminal
# System-wide
/etc/environment # simple KEY=VALUE, no export needed
/etc/profile.d/*.sh # scripts loaded at login.env Files
# .env file format
DB_HOST=localhost
DB_PORT=5432
DB_PASS=secret
# Load in bash
set -a; source .env; set +a
# Pass to docker
docker run --env-file .env myappIn systemd services
[Service]
Environment="DB_HOST=localhost"
EnvironmentFile=/etc/myapp/.envNever commit
.envfiles Add.envto.gitignore. Always. This is how credentials get leaked.
πΎ 9. Disk & Memory
You WILL hit "disk full" or "out of memory" in production. Know these cold.
Disk
df -h # disk usage per filesystem (human-readable)
df -h / # just root partition
du -sh /var/log # size of a directory
du -sh * | sort -rh # size of all items, sorted largest first
du -sh * | sort -rh | head -10 # top 10 largestFinding What's Eating Disk
du -sh /var/* | sort -rh | head # what's in /var?
du -sh /var/log/* | sort -rh # drilling into logs
find / -size +500M 2>/dev/null # files larger than 500MBMemory
free -h # RAM + swap (human-readable)
free -m # in megabytesReading free output:
total used free shared available
Mem: 15Gi 8.2Gi 2.1Gi 512Mi 6.5Gi
Swap: 2.0Gi 0B 2.0Gi
Use
available, notfreefree= completely unused RAM.available= what the OS can actually give to a new process (includes reclaimable cache). Always look atavailable.
# Top memory consumers
ps aux --sort=-%mem | head -10
# Top CPU consumers
ps aux --sort=-%cpu | head -10π¦ 10. tar & Compression
# Create archives
tar -czf archive.tar.gz dir/ # gzip compressed
tar -cjf archive.tar.bz2 dir/ # bzip2 compressed
tar -cf archive.tar dir/ # no compression
# Extract
tar -xzf archive.tar.gz # extract gzip
tar -xjf archive.tar.bz2 # extract bzip2
tar -xf archive.tar # extract uncompressed
# List contents without extracting
tar -tzf archive.tar.gz
# Extract to specific location
tar -xzf archive.tar.gz -C /target/dir/Memory trick for tar flags:
c = create
x = extract
z = gzip (.gz)
j = bzip2 (.bz2)
f = file (always last, before filename)
v = verbose (see what's happening)
π 11. Bash Scripting
Template (always start with this)
#!/bin/bash
set -euo pipefail # e=exit on error, u=error on undefined var, o pipefail=pipe failures
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"Variables
name="test"
echo "$name" # always quote variables
echo "${name}_suffix" # use braces when neededConditionals
if [ "$name" = "test" ]; then
echo "match"
elif [ "$name" = "other" ]; then
echo "other"
else
echo "no match"
fi
# File checks
[ -f file ] # file exists
[ -d dir ] # directory exists
[ -z "$var" ] # variable is empty
[ -n "$var" ] # variable is not emptyLoops
for i in 1 2 3; do echo "$i"; done
for file in *.log; do
echo "Processing $file"
done
while read -r line; do
echo "$line"
done < input.txtFunctions
greet() {
local name="$1" # local = scoped to function
echo "Hello, $name"
}
greet "Nishanth"Exit Codes
command
echo $? # 0 = success, anything else = failure
command || echo "command failed" # run on failure
command && echo "command succeeded" # run on successπ 12. systemd (Production Service Control)
Service Lifecycle
graph LR
A[inactive] -->|start| B[activating]
B --> C["active/running"]
C -->|stop| D[deactivating]
D --> E[inactive]
C -->|crash| F[failed]
F -->|start| B
style C fill:#2d5a27,color:#fff
style F fill:#5a2727,color:#fff
Core Commands
systemctl start app
systemctl stop app
systemctl restart app
systemctl reload app # reload config without restart (if supported)
systemctl status app # current state + recent logsBoot Persistence
systemctl enable app # start at boot
systemctl disable app # don't start at boot
systemctl is-enabled app # check if enabledLogs
journalctl -u app # all logs for service
journalctl -u app -f # follow (live tail)
journalctl -u app -n 100 # last 100 lines
journalctl -u app --since "1 hour ago"
journalctl -u app --since "2024-01-01" --until "2024-01-02"Service File Template
[Unit]
Description=My App
After=network.target # start after network is up
[Service]
Type=simple
ExecStart=/usr/bin/python3 /app/main.py
WorkingDirectory=/app
User=appuser
Group=app
Restart=always
RestartSec=5 # wait 5s before restarting
Environment=PORT=8080
EnvironmentFile=/app/.env
# Output to journal
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target# After creating/editing a service file:
systemctl daemon-reload # always do this first
systemctl enable --now app # enable + start in one commandπ₯ Real-World Debugging Flow
App not working? Run through this in order. Don't skip steps.
flowchart TD
A[App not working] --> B{systemctl status app}
B -->|failed/inactive| C[journalctl -u app -f\nRead the error]
B -->|active| D{ss -tulnp\nIs port open?}
D -->|port not open| C
D -->|port open| E{curl localhost:port\nDoes it respond?}
E -->|connection refused| F[Check process:\nps aux - grep app]
E -->|5xx error| C
E -->|works locally| G{DNS/Firewall issue\ndig, nmap, ufw status}
C --> H[Fix based on logs]
F --> H
G --> H
E -->|200 OK| I[β
App is fine\nCheck your client]
style A fill:#5a2727,color:#fff
style I fill:#2d5a27,color:#fff
style H fill:#1e3a5f,color:#fff
The 6-Step Checklist
# 1. Service status
systemctl status app
# 2. Logs (most important step)
journalctl -u app -f
# 3. Process running?
ps aux | grep app
# 4. Port listening?
ss -tulnp | grep 8080
# 5. HTTP responding?
curl -v localhost:8080
# 6. Disk full? Memory?
df -h && free -h
# 7. Permissions?
ls -lah /app/β Common DevOps Mistakes
chmod 777Removes all security. Every process on the machine can read, write, execute. Never do this.
Running everything as root One bad command and you've destroyed your system. Use dedicated service users.
Ignoring logs The answer is almost always in the logs.
journalctl -u app -fbefore doing anything else.
Not understanding groups Leads to silent permission failures that are painful to debug.
Hardcoded secrets in code Use env vars or secret managers. Never commit
.envfiles or passwords.
No
set -euo pipefailin scripts Without this, your script silently continues after failures.
π§ Final Truth
All of DevOps β Docker, Kubernetes, CI/CD, everything β is built on:
Filesystem + Permissions + Processes + Networking + Environment