For example, you have a task to Automating the process of checking the status of a list of websites and reporting on any that are down or return an error.
Prerequisites:
bash: Your default shell on Linux/macOS (or WSL on Windows).curl: Usually pre-installed.grep(optional but useful): For filtering text.- A text file with URLs: Let’s call it
websites.txt.
here is the shell script file to automate this task
#!/bin/bash
# --- Configuration ---
WEBSITE_LIST="websites.txt"
TIMEOUT_SECONDS=5
USER_AGENT="WebsiteStatusChecker/1.0 (curl)" # Identify our requests
LOG_FILE="website_check_log_$(date +%Y%m%d_%H%M%S).txt"
DOWN_SITES_FILE="down_websites_$(date +%Y%m%d_%H%M%S).txt"
# --- Functions ---
# Function to log messages to console and file
log() {
local message="$1"
echo "$(date +%Y-%m-%d\ %H:%M:%S) - $message" | tee -a "$LOG_FILE"
}
# Function to check a single URL
check_url() {
local url="$1"
local status_code=""
local error_message=""
log "Checking $url..."
# Use curl to get only the HTTP status code (-s silent, -o /dev/null no output, -w write-out)
# --head: only fetch HTTP headers, not the entire page content - much faster.
# --connect-timeout: max time for connection to happen.
# --max-time: max time for the entire operation.
status_code=$(curl -s -o /dev/null --head -w "%{http_code}" \
--connect-timeout "$TIMEOUT_SECONDS" --max-time "$TIMEOUT_SECONDS" \
-A "$USER_AGENT" "$url")
# $?: Check the exit status of the curl command. 0 means success.
if [ $? -ne 0 ]; then
error_message="Curl failed for $url. Error code: $?."
log " ERROR: $error_message"
echo "$url - ERROR: $error_message" >> "$DOWN_SITES_FILE"
elif [[ "$status_code" =~ ^[23][0-9]{2}$ ]]; then # Check for 2xx (Success) or 3xx (Redirection)
log " SUCCESS: $url returned HTTP $status_code"
else
log " WARNING: $url returned HTTP $status_code (Possible issue)"
echo "$url - HTTP $status_code" >> "$DOWN_SITES_FILE"
fi
}
# --- Main Script Logic ---
log "--- Starting Website Status Check ---"
log "Reading URLs from: $WEBSITE_LIST"
log "Log file: $LOG_FILE"
log "Down sites file: $DOWN_SITES_FILE"
echo "" > "$DOWN_SITES_FILE" # Clear the down sites file at the start
# Check if the website list file exists
if [ ! -f "$WEBSITE_LIST" ]; then
log "Error: Website list file '$WEBSITE_LIST' not found."
exit 1
fi
# Loop through each URL in the file
while IFS= read -r url || [[ -n "$url" ]]; do
# Skip empty lines and lines starting with # (comments)
if [[ -z "$url" || "$url" =~ ^\# ]]; then
continue
fi
check_url "$url"
done < "$WEBSITE_LIST"
log "--- Website Status Check Complete ---"
# --- Final Report ---
if [ -s "$DOWN_SITES_FILE" ]; then # -s checks if file has size (i.e., not empty)
log "\n--- Summary of Websites with Issues ---"
cat "$DOWN_SITES_FILE" | tee -a "$LOG_FILE" # Display and append to main log
log "Please review '$DOWN_SITES_FILE' for details."
else
log "\nAll checked websites returned successful status codes."
fi
How to Run:
- Save the script: Copy the code above and save it as
check_websites.sh(or any.shname). - Make it executable: Open your terminal and run:
chmod +x check_websites.sh - Create the URL list: Create a file named
websites.txtin the same directory as your script and paste the example URLs (or your own). - Run the script:
./check_websites.sh
Based on this example, comes with a practical question: I have a list of python file to run at designated time point, should I write such a shell script to automate it or use APScehduler?
Here’s a quick decision guide:
| Feature | Shell Script + Cron/Systemd Timer | APSScheduler (or similar Python library) |
| Control | System-level | Application-level |
| Dynamic Jobs | Hard to change at runtime | Easy to add/remove/modify at runtime |
| App Downtime | Jobs still run if app is down | Jobs stop if app is down (unless app is managed by systemd) |
| Complexity | Simpler for basic schedules | More complex setup, but powerful for complex logic |
| Python Env | Must manage explicitly in script | Inherits app’s environment |
| Primary Use | Independent, fixed, external tasks | Dependent, dynamic, internal application tasks |
| Concurrency | Each job is a separate process (easy parallelism) | Configurable (threads, processes, async) |
So APScheduler is more ideal solution.